Complex Structural Variant Characterization for Construction of Population-based Reference Genomes
Tuesday, September 27, 2022
9:00 AM-11:00 AM
BIOMED PhD Research Proposal
Title:
Complex Structural Variant Characterization for Construction of Population-based Reference Genomes
Speaker:
Jessica Wong, PhD Candidate
School of Biomedical Engineering, Science and Health Systems
Drexel University
Advisor:
Ming Xiao, PhD
Professor
School of Biomedical Engineering, Science and Health Systems
Drexel University
Details:
The reference genome provides the foundation for understanding the genetic basis of health and disease, however, it has approximately 1000 reported gaps and fails to capture full genetic diversity and ancestral haplotypes. Complex regions within the human genome are long, repetitive regions with frequent recombination and consist of segmental duplications (SDs) or other structural variations (SVs). These regions are still challenging to assemble, and SVs remain difficult to interpret with respect to their functional impacts on the human genome. Since the completion of the reference in 2003, there have been advancements in technologies and algorithms used by the 1000 Genomes Project. However, these sequencing technologies are costly, time-consuming, and require great effort to assemble an entire genome. Studies have shown that optical mapping, in comparison to sequencing technologies, can improve the capacity to accurately detect the approximate sizes and locations of SVs even in repetitive and complex regions for lower costs. Optical mapping may also be used in conjunction with sequencing methods to provide a comprehensive population study of complex regions that are unable to be detected and categorized before. The goal of this thesis is to develop an integrated informatics approach that combines optical mapping and sequencing technologies to reconstruct and reassemble complex SVs.
The first specific aim will focus on developing a pipeline for haplotype assembly, including the generation and validation of haplotypes. This aim will automate finding and validating all haplotypes from multiple genomes within a specified complex region. The second specific aim will focus on developing another pipeline to identify the variability in all haplotypes between multiple genomes to construct a population haplotype catalog. These pipelines will be applied to 400 phenotypically normal samples from diverse populations. Lastly, the third specific aim will investigate trios of healthy parents and children with genetic disorders. Current approaches in diagnosing conditions include the usage of short-read sequencing, yielding less than 40% of all diagnoses. This aim will identify and confirm SVs linked to clinical disorders, as well as precise breakpoint locations of these structures using a cas9-assisted targeted long-read sequencing methodology.
In summary, the approach developed in this thesis will enable a comprehensive large-scale genome comparison and identification of SVs, thus allowing the construction of a population-specific reference genome in a haplotype-resolved manner. These new references can be used to accurately identify complex regions across the genome to increase the accuracy of downstream analyses in future studies.
Contact Information
Natalia Broz
njb33@drexel.edu