Biotechnology Expert Opinions Innovation Internal Medicine Medical Specialties Nano medicine News Specials

Improvements to the Genome Taxonomy Database provide a complete bacterial and archaeal taxonomy

Metagenomics is the study of the metagenome—the collective genome of microorganisms from an environmental sample—to provide information on the microbial diversity and ecology of a specific environment. Shotgun metagenomics refers to the approach of shearing DNA extracted from the environmental sample and sequencing the small fragments.

A deluge of bacterial and archaeal genomes have been sequenced in recent years. Although this data trove greatly broadens our knowledge of the tree of life, it also poses new tasks in various areas, including taxonomy. “The most notable challenges in bacterial and archaeal taxonomy is the rapid rate at which new diversity is being discovered as a result of being able to recover genomes directly from environmental samples via metagenome-assembled and single-amplified genomes,” says Donovan Parks of the University of Queensland in Australia. “This has resulted in a proliferation of new lineages comprised exclusively of uncultivated organisms which have incomplete taxonomic assignments, often only being assigned to a candidate phylum.”

As a step toward tackling this situation, Parks, his colleague Philip Hugenholtz, and other team members built the Genome Taxonomy Database (GTDB) for genome-based taxonomy of Bacteria and Archaea using phylogenetic information. Despite its usefulness, GTDB was not complete, with about 40% of the genomes lacking a species name. Now the team has developed a computational strategy to automatically assign species names to genomes. An operational species definition is based on appropriate thresholds (95% to 97%) of average nucleotide identity, which measures the similarity between genomes. To address the computational burden, the team made substantial up-front efforts to establish an efficient and reliable workflow for generating the genome comparisons, as noted by Parks.

This strategy yielded 24,706 proposed species clusters, with 36% based on published species names. In addition to this domain-to-species taxonomy, their analysis also led to intriguing observations on evolutionary patterns of microbial genomic diversity and speciation. Parks sees momentum in genome-based taxonomy. “Genome-based taxonomy appears to be gaining wide acceptance in the research community. This is evident from both the increased use of the GTDB and the large number of recent manuscripts proposing taxonomic reclassifications based on analyses of genome assemblies. I expect this trend to continue.”

Sandesh Ilhe
Sandesh Ilhe
With an Engineers degree in Advanced Database Management and Information Security, Sandesh brings the deep understanding of the digital world to the table. His articles reflect the challenges and the complexities that come along with every disruption in the industry. He carries over six years of experience on working with websites and ensuring that the right article reaches the right reader.