Abstract: |
Global climate change is altering habit conditions at an unprecedented pace. Yet, even to this date, it remains unclear if and how agricultural production, including cereal crops and forest systems, can keep pace with these changes and sustain the critical need to feed a growing human population and support the planet's wellbeing. Taking advantage of the next-generation sequencing technology, crop variety development and tree improvement programs have a keen interest in early estimation of agronomical performance, such as end-use quality, productivity, as well as growth and adaptive attributes, longing for the capacity of a genetics-driven paradigm shift to increase adaptability and climate resilience in crop plants. However, prediction and association analyses with genetic markers like single nucleotide polymorphism (SNP) fell short, because SNP variants identified in association with trait variability confer far less heritability than expected from the empirical estimates, leading to unreliable predictions and a great letdown in technology adoption.
In the past decade, a growing number of studies have demonstrated the substantial impacts on the total fitness and adaptive capacity of plants as a result of structural variants (SVs)- genomic variations like copy number variations, deletions, insertions, tandem duplications and inversions that spans a greater region of nucleotides. However, SVs are a composite of a variable length of nucleotides, and often overlap, because of their size; the unstructured representations of SVs have made the compatibility with existing statistical algorithms challenging and even more so to interpret with the presence of single nucleotide mutations and substitutions like SNPs. Taking advantage of deep learning at the critical step of feature extraction and embedding, we have proposed a novel deep learning framework for whole-genome predictive analysis. Our approach seeks predictability by incorporating the rawest form of genomic information, the DNA sequences in which all genomic variants will be simultaneously modeled for prediction purpose, including both of structured (SNPs) and unstructured (SVs) data. This XSEDE application is to acquire adequate computing resources for the identification of SV, and for the construction and verification of the capacity of our deep learning prediction model for agriculturally and ecologically important wheat and conifer species. |