Abstract: |
# Project Goal
We are requesting a total of 400,000 SUs on Expanse to run and analyze large-scale multi-omics NGS datasets. These datasets include publicly accessible datasets generated by multiple consortiums as well as a substantial number of datasets generated by our team. Our objective is to gain insights into the epigenetic regulatory mechanisms underlying normal development and aging, which is a leading risk factor for the onset of multiple diseases.
# Resource Justification
The requested ACCESS resources will be used to re-analyze extensive public datasets across various assays, as well as a large number of datasets generated by us and our collaborators. This goal is not covered by another active grant, BIO240050, which focuses primarily on computational method development and optimization.
Given the large scale of both publicly accessible and internally generated datasets, we require substantial data storage for both raw (to be deleted after processing) and processed results (for long-term storage). We have established and benchmarked our computational methods on both Comet and Expanse (from January 2021 to June 2024, under grants MED200005 and MED200005-Extension). Our previous work ranges from standard data processing to customized analysis powered by machine learning, providing us with an accurate estimation of the total resources needed to accomplish our proposed work for the following year. In total, we request 4 TB of storage on the SDSC Expanse System to store the necessary datasets for running jobs and the results generated from each run.
# Software Packages
R Libraries
DESeq2
Limma
MACS2
WGCNA
SCENIC
chromVAR
Seurat
Monocle
Cicero
Signac
SnapATAC
Python/Perl Libraries
CellRanger
CellRanger-atac
HOMER
bowtie2
sratoolkit
scikit-learn |