Publications
Background: De novo genome assembly is essential to modern genomics studies. As it is not biased by a reference, it is also a useful method for studying genomes with high variation, such as cancer genomes. De novo short-read assemblers commonly use de Bruijn graphs, where nodes are sequences of equal length k, also known as k-mers. Edges in this graph are established between nodes that overlap by [Formula: see text] bases, and nodes along unambiguous walks in the graph are subsequently merged. The selection of k is influenced by multiple factors, and optimizing this value results in a trade-off between graph connectivity and sequence contiguity. Ideally, multiple k sizes should be used, so lower values can provide good connectivity in lesser covered regions and higher values can increase contiguity in well-covered regions. However, current approaches that use multiple k values do not address the scalability issues inherent to the assembly of large genomes.
Results: Here we present RResolver, a scalable algorithm that takes a short-read de Bruijn graph assembly with a starting k as input and uses a k value closer to that of the read length to resolve repeats. RResolver builds a Bloom filter of sequencing reads which is used to evaluate the assembly graph path support at branching points and removes paths with insufficient support. RResolver runs efficiently, taking only 26 min on average for an ABySS human assembly with 48 threads and 60 GiB memory. Across all experiments, compared to a baseline assembly, RResolver improves scaffold contiguity (NGA50) by up to 15% and reduces misassemblies by up to 12%.
Conclusions: RResolver adds a missing component to scalable de Bruijn graph genome assembly. By improving the initial and fundamental graph traversal outcome, all downstream ABySS algorithms greatly benefit by working with a more accurate and less complex representation of the genome. The RResolver code is integrated into ABySS and is available at https://github.com/bcgsc/abyss/tree/master/RResolver .
Background: To support the implementation of high-throughput pipelines suitable for SARS-CoV-2 sequencing and analysis in a clinical laboratory, we developed an automated sample preparation and analysis workflow.
Methods: We used the established ARTIC protocol with ∼400 bp amplicons sequenced on Oxford Nanopore's MinION. Sequences were analyzed using Nextclade, assigning both a clade and quality score to each sample.
Results: 2,179 samples on twenty-five 96-well plates were sequenced. Plates of purified RNA were processed within 12 hours, sequencing required up to 24 hours and analysis of each pooled plate required one hour. The use of samples with known Ct values enabled normalization, acted as a QC check, and revealed a strong correlation between sample Ct values and successful analysis, with 85% of samples with Ct < 30 achieving a "Good" Nexclade score. Less abundant samples responded to enrichment with the fraction of Ct > 30 samples achieving a "Good" classification rising by 60% after addition of a post-ARTIC PCR normalization. Serial dilutions of three variant of concern samples, diluted from Ct∼16 to Ct∼50, demonstrated successful sequencing to Ct 37. The sample set contained a median of 24 mutations per sample and a total of 1,281 unique mutations with reduced sequence read coverage noted in some regions of some samples. A total of ten separate strains were observed in the sample set, including three variants of concern prevalent in British Columbia in the spring of 2021.
Conclusions: We demonstrated a robust automated sequencing pipeline that takes advantage of input Ct values to improve reliability.
Emu (Dromaius novaehollandiae) farming has been gaining wide interest for fat production. Oil rendered from this large flightless bird's fat is valued for its anti-inflammatory and antioxidant properties for uses in therapeutics and cosmetics. We analyzed the seasonal and sex-dependent differentially expressed (DE) genes involved in fat metabolism in emus. Samples were taken from back and abdominal fat tissues of a single set of four male and four female emus in April, June, and November for RNA-sequencing. We found 100 DE genes (47 seasonally in males; 34 seasonally in females; 19 between sexes). Seasonally DE genes with significant difference between the sexes in gene ontology terms suggested integrin beta chain-2 (ITGB2) influences fat changes, in concordance with earlier studies. Six seasonally DE genes functioned in more than two enriched pathways (two female: angiopoietin-like 4 (ANGPTL4) and lipoprotein lipase (LPL); four male: lumican (LUM), osteoglycin (OGN), aldolase B (ALDOB), and solute carrier family 37 member 2 (SLC37A2)). Two sexually DE genes, follicle stimulating hormone receptor (FSHR) and perilipin 2 (PLIN2), had functional investigations supporting their influence on fat gain and loss. The results suggested these nine genes influence fat metabolism and deposition in emus.
Our ability to prognosticate the clinical course of patients with cancer has historically been limited to clinical, histopathological, and radiographic features. It has long been clear however, that these data alone do not adequately capture the heterogeneity and breadth of disease trajectories experienced by patients. The advent of efficient genomic sequencing has led to a revolution in cancer care as we try to understand and personalize treatment specific to patient clinico-genomic phenotypes. Within prostate cancer, emerging evidence suggests that tumor genomics (e.g., DNA, RNA, and epigenetics) can be utilized to inform clinical decision making. In addition to providing discriminatory information about prognosis, it is likely tumor genomics also hold a key in predicting response to oncologic therapies which could be used to further tailor treatment recommendations. Herein we review select literature surrounding the use of tumor genomics within the management of prostate cancer, specifically leaning toward analytically validated and clinically tested genomic biomarkers utilized in radiotherapy and/or adjunctive therapies given with radiotherapy.
Background: Genomic alterations to the androgen receptor (AR) are common in metastatic castration-resistant prostate cancer (mCRPC). AR copy number amplifications, ligand-binding domain missense mutations, and intronic structural rearrangements can all drive resistance to approved AR pathway inhibitors and their detection via tissue or liquid biopsy is linked to clinical outcomes. With an increasingly crowded treatment landscape, there is hope that AR genomic alterations can act as prognostic and/or predictive biomarkers to guide patient management.
Methods: In this review, we evaluate the current evidence for AR genomic alterations as clinical biomarkers in mCRPC, focusing on correlative studies that have used plasma circulating tumor DNA to characterize AR genotype.
Results: We highlight data that demonstrates the complexity of AR genotype within individual patients, and suggest that future studies should account for cancer clonal heterogeneity and variable tumor content in liquid biopsy samples. Given the potential for cooccurrence of multiple AR genomic alterations in the same or competing subclones of a patient, it is distinctly challenging to attribute blanket clinical significance to any individual alteration. This challenge is further complicated by the varied treatment exposures in contemporary patients, and the fact that AR genotype continues to evolve in the mCRPC setting across sequential lines of systemic therapy.
Conclusions: As treatment access and liquid biopsy technology continues to improve, we posit that real-time measures of AR biology are likely to play a key role in emerging precision oncology strategies for metastatic prostate cancer.
ABC-DLBCLs have unfavorable outcomes and chronic activation of CBM signal amplification complexes that form due to polymerization of BCL10 subunits, which is affected by recurrent somatic mutations in ABC-DLBCLs. Herein, we show that BCL10 mutants fall into at least two functionally distinct classes: missense mutations of the BCL10 CARD domain and truncation of its C-terminal tail. Truncating mutation abrogated a novel motif through which MALT1 inhibits BCL10 polymerization, trapping MALT1 in its activated filament-bound state. CARD missense mutation enhanced BCL10 filament formation; forming glutamine network structures that stabilize BCL10 filaments. Mutant forms of BCL10 were less dependent on upstream CARD11 activation and thus manifested resistance to BTK inhibitors, whereas BCL10 truncating but not CARD mutants were hypersensitive to MALT1 inhibitors. Therefore, BCL10 mutations are potential biomarkers for BTK inhibitor resistance in ABC-DLBCL and further precision can be achieved by selecting therapy based on specific biochemical effects of distinct mutation classes.
Detection of short tandem repeat (STR) expansions with standard short-read sequencing is challenging due to the difficulty in mapping multicopy repeat sequences. In this study, we explored how the long-range sequence information of barcode linked-read sequencing (BLRS) can be leveraged to improve repeat-read detection. We also devised a novel algorithm using BLRS barcodes for distance estimation and evaluated its application for STR genotyping. Both approaches were designed for genotyping large expansions (> 1 kb) that cannot be sized accurately by existing methods. Using simulated and experimental data of genomes with STR expansions from multiple BLRS platforms, we validated the utility of barcode and phasing information in attaining better STR genotypes compared to standard short-read sequencing. Although the coverage bias of extremely GC-rich STRs is an important limitation of BLRS, BLRS is an effective strategy for genotyping many other STR loci.
Sequencing of cell-free DNA (cfDNA) in cancer patients' plasma offers a minimally-invasive solution to detect tumor cell genomic alterations to aid real-time clinical decision-making. The reliability of copy number detection decreases at lower cfDNA tumor fractions, limiting utility at earlier stages of the disease. To test a novel strategy for detection of allelic imbalance, we developed a prostate cancer bespoke assay, PCF_SELECT, that includes an innovative sequencing panel covering ∼25 000 high minor allele frequency SNPs and tailored analytical solutions to enable allele-informed evaluation. First, we assessed it on plasma samples from 50 advanced prostate cancer patients. We then confirmed improved detection of genomic alterations in samples with <10% tumor fractions when compared against an independent assay. Finally, we applied PCF_SELECT to serial plasma samples intensively collected from three patients previously characterized as harboring alterations involving DNA repair genes and consequently offered PARP inhibition. We identified more extensive pan-genome allelic imbalance than previously recognized in prostate cancer. We confirmed high sensitivity detection of BRCA2 allelic imbalance with decreasing tumor fractions resultant from treatment and identified complex ATM genomic states that may be incongruent with protein losses. Overall, we present a framework for sensitive detection of allele-specific copy number changes in cfDNA.
Myeloid ecotropic virus insertion site 1 (MEIS1) is essential for normal hematopoiesis and is a critical factor in the pathogenesis of a large subset of acute myeloid leukemia (AML). Despite the clinical relevance of MEIS1, its regulation is largely unknown. To understand the transcriptional regulatory mechanisms contributing to human MEIS1 expression, we created a knock-in green florescent protein (GFP) reporter system at the endogenous MEIS1 locus in a human AML cell line. Using this model, we have delineated and dissected a critical enhancer region of the MEIS1 locus for transcription factor (TF) binding through in silico prediction in combination with oligo pull-down, mass-spectrometry and knockout analysis leading to the identification of FLI1, an E-twenty-six (ETS) transcription factor, as an important regulator of MEIS1 transcription. We further show direct binding of FLI1 to the MEIS1 locus in human AML cell lines as well as enrichment of histone acetylation in MEIS1-high healthy and leukemic cells. We also observe a positive correlation between high FLI1 transcript levels and worse overall survival in AML patients. Our study expands the role of ETS factors in AML and our model constitutes a feasible tool for a more detailed understanding of transcriptional regulatory elements and their interactome.
As guidelines, therapies and literature on cancer variants expand, the lack of consensus variant interpretations impedes clinical applications. CIViC is a public-domain, crowd-sourced and adaptable knowledgebase of evidence for the clinical interpretation of variants in cancer, designed to reduce barriers to knowledge sharing and alleviate the variant-interpretation bottleneck.