Deep-Learning Whole-Genome Prediction for Complex Plant Genomes

Charles Chen, Oklahoma State University

0000-0002-2203-0433

ACCESS Allocation Request MCB180177

Abstract:	Global climate change is altering habit conditions at an unprecedented pace. Yet, even to this date, it remains unclear if and how agricultural production, including cereal crops and forest systems, can keep pace with these changes and sustain the critical need to feed a growing human population and support the planet's wellbeing. Taking advantage of the next-generation sequencing technology, crop variety development and tree improvement programs have a keen interest in early estimation of agronomical performance, such as end-use quality, productivity, as well as growth and adaptive attributes, longing for the capacity of a genetics-driven paradigm shift to increase adaptability and climate resilience in crop plants. However, prediction and association analyses with genetic markers like single nucleotide polymorphism (SNP) fell short, because SNP variants identified in association with trait variability confer far less heritability than expected from the empirical estimates, leading to unreliable predictions and a great letdown in technology adoption. In the past decade, a growing number of studies have demonstrated the substantial impacts on the total fitness and adaptive capacity of plants as a result of structural variants (SVs)- genomic variations like copy number variations, deletions, insertions, tandem duplications and inversions that spans a greater region of nucleotides. However, SVs are a composite of a variable length of nucleotides, and often overlap, because of their size; the unstructured representations of SVs have made the compatibility with existing statistical algorithms challenging and even more so to interpret with the presence of single nucleotide mutations and substitutions like SNPs. Taking advantage of deep learning at the critical step of feature extraction and embedding, we have proposed a novel deep learning framework for whole-genome predictive analysis. Our approach seeks predictability by incorporating the rawest form of genomic information, the DNA sequences in which all genomic variants will be simultaneously modeled for prediction purpose, including both of structured (SNPs) and unstructured (SVs) data. This XSEDE application is to acquire adequate computing resources for the identification of SV, and for the construction and verification of the capacity of our deep learning prediction model for agriculturally and ecologically important wheat and conifer species.

Allocations:

2022	PSC Bridges-2 Extreme Memory (PSC Bridges-2 EM)	293,726.0 Core-hours
2022	PSC Bridges-2 GPU (PSC Bridges-2 GPU)	23,718.0 GPU Hours
2022	PSC Bridges-2 Regular Memory (PSC Bridges-2 RM)	626,413.0 Core-hours
2022	PSC Bridges-2 Storage (PSC Ocean)	146,000.0 GB
The estimated value of these awarded resources is $75,856.50. The allocation of these resources represents a considerable investment by the NSF in advanced computing infrastructure for the U.S. The dollar value of the allocation is estimated from the NSF awards supporting the allocated resources.
2020	PSC Bridges-2 Extreme Memory (PSC Bridges-2 EM)	324,720.0 Core-hours
2020	PSC Bridges-2 GPU Artificial Intelligence (PSC Bridges-2 GPU-AI)	63,244.0 GPU Hours
2020	PSC Bridges-2 Regular Memory (PSC Bridges-2 RM)	361,875.0 Core-hours
2020	PSC Bridges-2 Storage (PSC Ocean)	60,000.0 GB
2020	PSC GPU-AI (Bridges GPU Artificial Intelligence)	26,756.5 GPU Hours
2020	PSC Large Memory (Bridges Large)	5,542.0 Memory Hours
2020	PSC Regular Memory (Bridges)	27,187.0 SUs
2020	PSC Storage (Bridges Pylon)	60,000.0 GB
The estimated value of these awarded resources is $148,563.36. The allocation of these resources represents a considerable investment by the NSF in advanced computing infrastructure for the U.S. The dollar value of the allocation is estimated from the NSF awards supporting the allocated resources.

Click to show/hide prior allocations »

2019	PSC GPU (Bridges GPU)	7,500.0 GPU Hours
2019	PSC GPU-AI (Bridges GPU Artificial Intelligence)	14,100.0 GPU Hours
2019	PSC Large Memory (Bridges Large)	83,647.0 Memory Hours
2019	PSC Regular Memory (Bridges)	261,310.0 SUs
2019	PSC Storage (Bridges Pylon)	79,200.0 GB
The estimated value of these awarded resources is $86,199.64. The allocation of these resources represents a considerable investment by the NSF in advanced computing infrastructure for the U.S. The dollar value of the allocation is estimated from the NSF awards supporting the allocated resources.

Other Titles:

Click to show/hide prior titles »

Deep Convolutional Neural Network Whole-Genome Prediction by Structural Variants in Complex Plant Genomes
Predictive Modeling for Climate Resilient Phenotypes in Mega-Size Plant Genomes