Machine Learning and Multi-omics Data Analysis to Decipher the Epigenetic Regulatory Mechanisms underlying Development, Aging, and Multiple Diseases

Qi Ma, University of California, San Diego

0000-0002-8705-4460

ACCESS Allocation Request MED200005

CoPI:	Jia Shen	University of California, San Diego
Abstract:	# Project Goal We are requesting a total of 400,000 SUs on Expanse to run and analyze large-scale multi-omics NGS datasets. These datasets include publicly accessible datasets generated by multiple consortiums as well as a substantial number of datasets generated by our team. Our objective is to gain insights into the epigenetic regulatory mechanisms underlying normal development and aging, which is a leading risk factor for the onset of multiple diseases. # Resource Justification The requested ACCESS resources will be used to re-analyze extensive public datasets across various assays, as well as a large number of datasets generated by us and our collaborators. This goal is not covered by another active grant, BIO240050, which focuses primarily on computational method development and optimization. Given the large scale of both publicly accessible and internally generated datasets, we require substantial data storage for both raw (to be deleted after processing) and processed results (for long-term storage). We have established and benchmarked our computational methods on both Comet and Expanse (from January 2021 to June 2024, under grants MED200005 and MED200005-Extension). Our previous work ranges from standard data processing to customized analysis powered by machine learning, providing us with an accurate estimation of the total resources needed to accomplish our proposed work for the following year. In total, we request 4 TB of storage on the SDSC Expanse System to store the necessary datasets for running jobs and the results generated from each run. # Software Packages R Libraries DESeq2 Limma MACS2 WGCNA SCENIC chromVAR Seurat Monocle Cicero Signac SnapATAC Python/Perl Libraries CellRanger CellRanger-atac HOMER bowtie2 sratoolkit scikit-learn

Allocations:

2024	SDSC Expanse CPU	195,000.0 Core-hours
2024	SDSC Expanse Projects Storage	5,000.0 GB
The estimated value of these awarded resources is $1,108.00. The allocation of these resources represents a considerable investment by the NSF in advanced computing infrastructure for the U.S. The dollar value of the allocation is estimated from the NSF awards supporting the allocated resources.
2022	SDSC Expanse CPU	694,230.0 Core-hours
2022	SDSC Expanse Projects Storage	5,000.0 GB
The estimated value of these awarded resources is $3,304.61. The allocation of these resources represents a considerable investment by the NSF in advanced computing infrastructure for the U.S. The dollar value of the allocation is estimated from the NSF awards supporting the allocated resources.

Click to show/hide prior allocations »

2021	SDSC Dell Cluster with Intel Haswell Processors (Comet)	573,709.0 SUs
2021	SDSC Expanse CPU	104,823.0 Core-hours
2021	SDSC Expanse GPU	1,000.0 GPU Hours
2021	SDSC Expanse Projects Storage	4,132.0 GB
2021	SDSC Medium-term disk storage (Data Oasis)	500.0 GB
The estimated value of these awarded resources is $10,105.46. The allocation of these resources represents a considerable investment by the NSF in advanced computing infrastructure for the U.S. The dollar value of the allocation is estimated from the NSF awards supporting the allocated resources.

Other Titles:

There are no prior titles for this project.