The Microbial Genomes Atlas Science Gateway – MiGA Gateway: A Searchable Database of Prokaryotic Genomes for Taxonomic Identification and Diversity Cataloguing

Konstantinos Konstantinidis, Georgia Institute of Technology

ACCESS Allocation Request MCB190042

CoPI:	Luis Rodriguez Rojas	University of Innsbruck
Past CoPI:	James Cole	Michigan State University
Past PI:	Luis Rodriguez Rojas	University of Innsbruck
Past CoPI:	Konstantinos Konstantinidis	Georgia Institute of Technology
Abstract:	The diversity of prokaryotic microbes on the planet is very large, estimated at over a billion species of bacteria, and most of it remains undiscovered. As genome sequencing can help characterizing this diversity and has recently become routine, most microbial scientists have been overwhelmed by the amount of genomic data that were made recently available. Tools that can help direct researchers to the most "interesting" genomes among thousands of candidates are therefore of great importance, including for the identification (diagnostics) of microbial disease agents and diversity discovery. The availability of such tools is currently limited to a handful of services, including the Microbial Genomes Atlas (MiGA; Rodriguez-R et al 2018). MiGA is a genomic data processing and management system that uses whole-genome comparisons for the identification of relatives and taxonomic classification, and provides several tools for genome quality evaluation and genome clustering for novel (not previously described) microorganisms. This is a major need for better understanding, studying, and communicating about the biodiversity of uncultivated microorganisms that run the life-sustaining biogeochemical cycles on the planet, form critical associations with their plant, animal, and human hosts, or produce products of biotechnological value. Therefore, current approaches to make the emerging genomic sequence information readily available to the non-expert user are essential in order to advance our understanding of the diversity and function of microbial communities across the fields of ecology, systematics, evolution, engineering, agriculture, and medicine. Together with the MiGA infrastructure, we also released the MiGA Online webserver, an online system that allows users evaluating, comparing, and classifying their own genome sequences against different reference databases including a total of over 100,000 genomes. MiGA Online is currently being used by over 3,000 registered users from 83 countries (the majority being from the US: 16.5%), with ~1,000 monthly queries on average (Figure 1). MiGA has been extensively used for the proposal of novel taxa, the classification and evaluation of microbial genomes, and to advance data-driven microbial taxonomy, with the MiGA Online paper (Rodriguez-R et al., 2018) having been cited over 500 times (Google Scholar), and at least six other publications describing specific resources within MiGA, including the recent publication of FastAAI (Ruiz Pérez, Gerhardt, Rodriguez-R et al, 2025). Using previous XSEDE/ACCESS allocations we developed and deployed “MiGA Gateway” (formerly “MiGA @ XSEDE”), which we are in the process of describing in a manuscript in preparation. Our projections based on the usage of the MiGA webserver that is run on our local computer clusters at Georgia Tech and University of Innsbruck (Fig. 1) indicate that about 0.8 million CPU hours per year will be required to support the scientific community that is interested in using the MiGA infrastructure. Additionally, 0.7 million CPU hours will be required to continue producing the bimonthly updates of the reference databases in MiGA, totaling 1.5 million CPU hours (see below). This is a conservative prediction given the strong upward trends in usage of MiGA and that this community is large and covers the fields of microbial ecology, systematics, evolution, engineering, agriculture and medicine. Our local computer clusters at Georgia Tech and University of Innsbruck are limited, and do not represent sustainable and scalable options for the increasing use of MiGA. If our renewal application is approved, we will direct current users from our webserver to the Gateway implementation and will advertise this implementation more broadly, including in webinars, training workshops such as during the ASMCUE conference in order to recruit even more users to MiGA Gateway, as we have done in previous workshops led by Luis M Rodriguez-R (Co-PI) and Kostas Konstantinidis (PI). Notably, Prof. Luis-Miguel Rodriguez-R (co-PI) is based in Austria, and will be able to organize workshops and recruit users from a truly international pool.

Allocations:

2025	SDSC Expanse CPU	250,000.0 Core-hours
2025	SDSC Expanse Projects Storage	50,000.0 GB
The estimated value of these awarded resources is $3,600.00. The allocation of these resources represents a considerable investment by the NSF in advanced computing infrastructure for the U.S. The dollar value of the allocation is estimated from the NSF awards supporting the allocated resources.
2024	SDSC Expanse CPU	747,000.0 Core-hours
2024	SDSC Expanse Projects Storage	3,000.0 GB
The estimated value of these awarded resources is $3,436.80. The allocation of these resources represents a considerable investment by the NSF in advanced computing infrastructure for the U.S. The dollar value of the allocation is estimated from the NSF awards supporting the allocated resources.

Click to show/hide prior allocations »

2023	SDSC Expanse CPU	1,480,000.0 Core-hours
2023	SDSC Expanse Projects Storage	20,000.0 GB
The estimated value of these awarded resources is $7,512.00. The allocation of these resources represents a considerable investment by the NSF in advanced computing infrastructure for the U.S. The dollar value of the allocation is estimated from the NSF awards supporting the allocated resources.
2021	SDSC Expanse CPU	5,332,000.0 Core-hours
2021	SDSC Expanse Projects Storage	6,000.0 GB
2021	XSEDE Extended Collaborative Support	Yes
The estimated value of these awarded resources is $23,760.80. The allocation of these resources represents a considerable investment by the NSF in advanced computing infrastructure for the U.S. The dollar value of the allocation is estimated from the NSF awards supporting the allocated resources.
2019	SDSC Dell Cluster with Intel Haswell Processors (Comet)	123,186.0 SUs
2019	SDSC Expanse CPU	122,212.0 Core-hours
2019	SDSC Expanse Projects Storage	3,072.0 GB
2019	XSEDE Extended Collaborative Support	Yes
The estimated value of these awarded resources is $2,539.12. The allocation of these resources represents a considerable investment by the NSF in advanced computing infrastructure for the U.S. The dollar value of the allocation is estimated from the NSF awards supporting the allocated resources.

Other Titles:

Click to show/hide prior titles »

The Microbial Genomes Atlas Science Gateway – MiGA @ XSEDE: A Searchable Database of Prokaryotic Genomes for Taxonomic Identification and Diversity Cataloguing