Systematic Identification of Non-coding Somatic Single Nucleotide Variants Associated with Altered Transcription and DNA Methylation in Adult and Pediatric Cancers
Whole-genome sequencing combined with transcriptomics can reveal impactful non-coding single nucleotide variants (SNVs) in cancer. Here, we developed an integrative analytical approach that, as a first step, identifies genes altered in expression or DNA methylation in association with nearby somatic SNVs, in contrast to alternative approaches that first identify mutational hotspots. Using genomic datasets from the Pan-Cancer Analysis of Whole Genomes (PCAWG) consortium and the Children's Brain Tumor Tissue Consortium (CBTTC), we identified hundreds of genes and associated CpG islands for which the nearby presence of a non-coding somatic SNV recurrently associated with altered expression or DNA methylation, respectively. Genomic regions upstream or downstream of genes, gene introns and gene untranslated regions were all involved. The PCAWG adult cancer cohort yielded different significant SNV-expression associations from the CBTTC pediatric brain tumor cohort. The SNV-expression associations involved a wide range of cancer types and histologies, as well as potential gain or loss of transcription factor binding sites. Notable genes with SNV-associated increased expression include TERT, COPS3, POLE2 and HDAC2-involving multiple cancer types-MYC, BCL2, PIM1 and IGLL5-involving lymphomas-and CYHR1-involving pediatric low-grade gliomas. Non-coding somatic SNVs show a major role in shaping the cancer transcriptome, not limited to mutational hotspots.