For a better experience, click the Compatibility Mode icon above to turn off Compatibility Mode, which is only for viewing older websites.

Novel Bioinformatics Methods for Integrative Detection of Structural Variants From LR Sequencing

Monday, November 27, 2023

10:00 AM-12:00 PM

BIOMED PhD Research Proposal

Title:
Novel Bioinformatics Methods for Integrative Detection of Structural Variants From Long-Read (LR) Sequencing

Speaker:
Jonathan Elliot Perdomo, PhD Candidate
School of Biomedical Engineering, Science and Health Systems
Drexel University

Advisors:
Kai Wang, PhD
Raymond G. Perelman Center for Cellular and Molecular Therapeutics
Children's Hospital of Philadelphia (CHOP)
Professor of Pathology and Laboratory Medicine
Perelman School of Medicine
University of Pennsylvania
 
Ming Xiao, PhD
Professor
School of Biomedical Engineering, Science and Health Systems
Drexel University

Details:
Structural variants (SVs) are defined as genomic alterations >50 bp which form the largest source of human genome variation. Identifying SVs associated with clinical phenotypes empowers clinical diagnoses and allows researchers to investigate potential molecular mechanisms. Emerging long read sequencing platforms provide the resolution required to resolve larger and more complex SVs. Nevertheless, variable error rates in these technologies possibly result in a high false positive rate and low robustness for SV detection.

The rich repertoire of available technologies, including short read sequencing and optical mapping, can be leveraged to resolve these limitations. Here we introduce ContextSV, a novel SV calling method which uses a hybrid approach to improve accuracy and robustness: Long read data is used to identify SV candidates, while short reads yield high-accuracy sites for resolving breakpoints in complex SVs, and optical maps provide long-range scaffolds for high-quality read assembly prior to running SV detection algorithms. To improve accuracy, we train a binary classification model to score candidate SVs based on coverage and genomic context, which are key SV validation features.

Scores are used to filter low likelihood SVs. Finally, we plan to incorporate support for pangenome graph reference formats in ContextSV: A pangenome better represents common haplotypes in the human population relative to a single linear reference genome, and thus would form a more comprehensive reference for SV identification. Large collaborative efforts including the Human Pangenome Reference Consortium (HPRC) aim to release a pangenome representing a large, diverse set of human genome sequences, and thus there is a growing importance for future SV callers to provide graph reference support. In summary, ContextSV enables capturing large, complex SVs with high accuracy and robustness by leveraging information across multiple technologies and using a machine learning model to compute confidence scores, while providing support for future pangenome developments.

Contact Information

Natalia Broz
njb33@drexel.edu

Remind me about this event. Notify me if this event changes. Add this event to my personal calendar.

Location

Remote

Audience

  • Undergraduate Students
  • Graduate Students
  • Faculty
  • Staff