Papadakis Integrated Sciences Building (PISB), Room 104, located on the northeast corner of 33rd and Chestnut Streets.
BIOMED Master's Thesis Defense
Utilizing a Convolutional Neural Network To Predict HIV-1 Tat Biological Functions and the Impact of Tat Genetic Variation on Neurocognitive Impairment
Angela Tomita, Master's Candidate
School of Biomedical Engineering, Science and Health Systems
Will Dampier, PhD
Department of Microbiology and Immunology
Drexel University College of Medicine
Human immunodeficiency virus type 1 (HIV-1) infections have been associated with neurocognitive impairment. However, this cognitive decline occurs at a rate that is specific to individual patients. Evidence suggests that the protein sequence of HIV-1 Tat may have an impact on this rate. The mutations within Tat can affect the biological function of this viral protein. This study proposes to utilize deep learning techniques to perform analysis inference on amino acid mutations changes and their effects on Tat functions.
To predict biological functions consistent with annotations of the Gene Ontology (GO) Consortium, a convolutional neural network (CNN) was developed and trained on more than 500,000 GO-annotated sequences from the Uniprot Knowledgebase. Analysis was performed on various lengths of Tat protein sequences, considering the prevalence of naturally truncated Tat protein sequences in HIV-1-infected patient patients from the Drexel Medicine CNS AIDS Research and Eradication Study (CARES) cohort. Results of processing truncated Tat subtype B sequences with lengths between 20 and 60 amino acids revealed a decreased sequence similarity with proteins found in the nucleus and an increased sequence similarity with proteins associated with pathogenesis, extracellular regions, and cellular toxicity.
With the focus of examining which regions of Tat are responsible for its known biological functions, a windowing strategy was implemented. Using 30-mer windows, there was an increase in sequence similarity to proteins associated with ribosomal function and translation for the Tat polypeptide region between positions residues 40 and 80. The CNN achieved an area under the receiver operating curve (AUC) of 0.985 for GO predictions and 0.820 for neurocognitive impairment. The evaluation metric of AUC has been used in several peer-reviewed papers, but it is not an ideal evaluation method. Although the NCI prediction AUC scores are fairly high, there is little to no correlation between the predicted and calculated, actual score using Pearson correlation.