The Microbial Genomes Atlas Science Gateway – MiGA Gateway: A Searchable Database of Prokaryotic Genomes for Taxonomic Identification and Diversity Cataloguing

Konstantinos Konstantinidis, Georgia Institute of Technology

ACCESS Allocation Request MCB190042

CoPI: Luis Rodriguez Rojas University of Innsbruck
Abstract: The diversity of prokaryotic microbes on the planet is very large, estimated at over a billion species of bacteria, and most of it remains undiscovered. As genome sequencing can help characterizing this diversity and has recently become routine, most microbial scientists have been overwhelmed by the amount of genomic data that were made recently available. Tools that can help direct researchers to the most "interesting" genomes among thousands of candidates are therefore of great importance, including for the identification (diagnostics) of microbial disease agents and diversity discovery. The availability of such tools is currently limited to a handful of services, including the Microbial Genomes Atlas (MiGA; Rodriguez-R et al 2018). MiGA is a genomic data processing and management system that uses whole-genome comparisons for the identification of relatives and taxonomic classification, and provides several tools for genome quality evaluation and genome clustering for novel (not previously described) microorganisms. This is a major need for better understanding, studying, and communicating about the biodiversity of uncultivated microorganisms that run the life-sustaining biogeochemical cycles on the planet, form critical associations with their plant, animal, and human hosts, or produce products of biotechnological value. Therefore, current approaches to make the emerging genomic sequence information readily available to the non-expert user are essential in order to advance our understanding of the diversity and function of microbial communities across the fields of ecology, systematics, evolution, engineering, agriculture, and medicine. Together with the MiGA infrastructure, we also released the MiGA Online webserver, an online system that allows users evaluating, comparing, and classifying their own genome sequences against different reference databases including a total of over 100,000 genomes. MiGA Online is currently being used by over 2,500 registered users around the world, with ~1,000 monthly queries on average (Figure 1). MiGA has been extensively used for the proposal of novel taxa, the classification and evaluation of microbial genomes, and to advance data-driven microbial taxonomy, with the MiGA Online paper (Rodriguez-R et al., 2018) having been cited 423 times (Google Scholar), and at least five other publications describing specific resources within MiGA. Using previous XSEDE/ACCESS allocations we developed and deployed “MiGA Gateway” (formerly “MiGA @ XSEDE”), which we are in the process of describing in a manuscript in preparation. Our projections based on the usage of the MiGA webserver that is run on our local computer clusters at Georgia Tech and University of Innsbruck (Fig. 1) indicate that about 0.7 million CPU hours per year will be required to support the scientific community that is interested in using the MiGA infrastructure. Additionally, 0.6 million CPU hours will be required to continue producing the bimonthly updates of the reference databases in MiGA, totaling 1.3 million CPU hours (see below). This is a conservative prediction given the strong upward trends in usage of MiGA and that this community is large and covers the fields of microbial ecology, systematics, evolution, engineering, agriculture and medicine. Our local computer clusters at Georgia Tech and University of Innsbruck are limited, and do not represent sustainable and scalable options for the increasing use of MiGA. If our renewal application is approved, we will direct current users from our webserver to the Gateway implementation and will advertise this implementation more broadly, including in webinars, training workshops such as during the ASMCUE conference, in order to recruit even more users to MiGA Gateway. Notably, Prof. Luis-Miguel Rodriguez-R (co-PI) is based in Austria, and will be able to organize workshops and recruit users from a truly international pool.

Allocations:

2024 ACCESS Credits 750,000.0 ACCESS Credits
2022 ACCESS Credits 1,300,000.0 ACCESS Credits
Click to show/hide prior allocations »
2021 SDSC Expanse Projects Storage 6,000.0 GB
2021 SDSC Expanse CPU 5,332,000.0 Core-hours
The estimated value of these awarded resources is $23,760.80. The allocation of these resources represents a considerable investment by the NSF in advanced computing infrastructure for the U.S. The dollar value of the allocation is estimated from the NSF awards supporting the allocated resources.
2019 XSEDE Extended Collaborative Support Yes
2019 SDSC Dell Cluster with Intel Haswell Processors (Comet) 700,000.0 SUs
2019 SDSC Medium-term disk storage (Data Oasis) 3,072.0 GB
The estimated value of these awarded resources is $11,618.21. The allocation of these resources represents a considerable investment by the NSF in advanced computing infrastructure for the U.S. The dollar value of the allocation is estimated from the NSF awards supporting the allocated resources.

Other Titles:

Click to show/hide prior titles »
The Microbial Genomes Atlas Science Gateway – MiGA @ XSEDE: A Searchable Database of Prokaryotic Genomes for Taxonomic Identification and Diversity Cataloguing