Prediction of Protein Structure-Function Impact on Host-Pathogen Interactions

Vincent Peta, University of South Dakota

0000-0002-9901-5207

ACCESS Allocation Request CIS200051

CoPI:	Diing Agany	University of South Dakota
CoPI:	Etienne Gnimpieba	University of South Dakota
CoPI:	Jose Pietri	University of South Dakota
Abstract:	This project is a smaller piece of another larger project that entails elucidating the interactions between the host-pathogen system using machine learning (ML) and data mining (DM), resembling work performed by Agany et al., 2020. We hypothesize that with each spatial configuration of a given protein, a force is exerted in a unique pattern that is related to the protein function. This integration into the entirety of the cohort of proteins involved in microbial pathogenicity depends on a conserved pattern of functional variables and features of proteins. We will focus primarily on development of completed dataset collection pipeline that will leverage computational theory-based modeling on XSEDE and machine learning to extract features that are involved in vector-pathogen relationships and the outcome that leads to microbial pathogenicity or transmission phenotypes that could then lead to pathogenicity in a human host. The first use case that we will base our work on will be to investigate the human-bed bug problem that is currently being researched in Dr. Jose Pietri’s Lab. The bed bug problem, due in part by increasing resistance to insecticides, and that previous national control actions had varying impacts on reducing genetic susceptibility to insecticides, as bed bug have found numerous other locales to thrive in, caused a resurgence of this medically relevant pest. Bed bugs also have the ability to potentially spread microbial pathogens through the environment and their bites can lead to skin infections. As stated above, having the ability to characterize proteins of insect vectors, could unmask possible ways of controlling the spread of insect vectors and in turn microbial pathogens. Our workflow will be first evaluated with a test dataset on a local GPU through the Gnimpieba Lab and the USD High Performance Computing Cluster (HPC) and formatted into a Docker environment that will allow us to develop and run our workflow with the ability of an easy setup and processing that any group member could use with little struggle. Numerous microbial pathogens have had their proteomes sequenced and studied, yet, even with these data, there are very limited protein structures available for use. Not only that, but not all pathogens have had this analysis performed and those that do have protein data available, have over 1,000 proteins, with Salmonella typhimurium having over 4,500 and Bartonella henselae having over 1,500 proteins. Coupling this with interactions with either an insect vector (> 20,000 proteins) during colonization or human host (> 30,000 proteins) during infection, and the complexity of this biological system interaction increases greatly.

Allocations:

2020	IU/TACC (Jetstream)	50,000.0 SUs
2020	IU/TACC Storage (Jetstream Storage)	500.0 GB
The estimated value of these awarded resources is $1,025.00. The allocation of these resources represents a considerable investment by the NSF in advanced computing infrastructure for the U.S. The dollar value of the allocation is estimated from the NSF awards supporting the allocated resources.

There are no other allocations for this project.

Other Titles:

There are no prior titles for this project.