AI System Trained on Bacterial Genomes Generates Novel Protein Sequences
AI Model Trained on Bacterial Genomes
Scientists developed an artificial intelligence system called Evo, which is trained on the full complement of bacterial genomes. By learning the relationships between nucleotide patterns and larger genomic contexts, Evo can interpret DNA fragments much like a language model interprets text.
Filling Gaps in Known Genes
When prompted with a partial sequence of a known gene, Evo can accurately predict the missing portion. For example, providing 30 percent of a gene’s sequence allowed Evo to generate 85 percent of the remainder, and supplying 80 percent of the sequence resulted in a complete reconstruction.
Restoring Deleted Genes
In experiments where a single gene was removed from a functional cluster, Evo correctly identified and restored the missing gene, demonstrating its understanding of gene organization and functional relationships.
Generating Novel Protein Sequences
Beyond completing existing genes, Evo was challenged to produce new protein sequences. Researchers used bacterial toxin genes, which typically evolve rapidly and are paired with antitoxin genes. By prompting Evo with a toxin that was only mildly related to known toxins and filtering out responses resembling known antitoxins, the system generated a novel toxin sequence with no obvious antitoxin counterpart.
Implications for Biotechnology
These results indicate that Evo can not only replicate known biological information but also explore new sequence space while respecting evolutionary constraints. This opens avenues for designing proteins with desired functions, accelerating synthetic biology, and expanding the toolkit for protein engineering.
Usado: News Factory APP - descubrimiento de noticias y automatización - ChatGPT para Empresas