Ancestry and frequency of genetic variants in the general population are confounders in the characterization of germline variants linked to cancer
Pediatric high-grade gliomas (pHGGs) are incurable malignant brain cancers. Clear somatic genetic drivers are difficult to identify in the majority of cases. We hypothesized that this may be due to the existence of germline variants that influence tumor etiology and/or progression and are filtered out using traditional pipelines for somatic mutation calling.
In this study, we analyzed whole-genome sequencing (WGS) datasets of matched germlines and tumor tissues to identify recurrent germline variants in pHGG patients.
We identified two structural variants that were highly recurrent in a discovery cohort of 8 pHGG patients. One was a ~ 40 kb deletion immediately upstream of the NEGR1 locus and predicted to remove the promoter region of this gene. This copy number variant (CNV) was present in all patients in our discovery cohort (n = 8) and in 86.3% of patients in our validation cohort (n = 73 cases). We also identified a second recurrent deletion 55.7 kb in size affecting the BTNL3 and BTNL8 loci. This BTNL3–8 deletion was observed in 62.5% patients in our discovery cohort, and in 17.8% of the patients in the validation cohort. Our single-cell RNA sequencing (scRNA-seq) data showed that both deletions result in disruption of transcription of the affected genes. However, analysis of genomic information from multiple non-cancer cohorts showed that both the NEGR1 promoter deletion and the BTNL3–8 deletion were CNVs occurring at high frequencies in the general population. Intriguingly, the upstream NEGR1 CNV deletion was homozygous in ~ 40% of individuals in the non-cancer population. This finding was immediately relevant because the affected genes have important physiological functions, and our analyses showed that NEGR1 expression levels have prognostic value for pHGG patient survival. We also found that these deletions occurred at different frequencies among different ethnic groups.
Our study highlights the need to integrate cancer genomic analyses and genomic data from large control populations. Failure to do so may lead to spurious association of genes with cancer etiology. Importantly, our results showcase the need for careful evaluation of differences in the frequency of genetic variants among different ethnic groups.
We thank Jennifer Howe and Jeffrey MacDonald for assistance and helpful discussions.