A highly usable and fully comprehensive proteogenomic method leveraging the breadth of genetic variation for oncogenic and//or therapeutic novel protein discovery

Asset 10.png

CBTN Data Used


Scholar Award

Rita Allen Foundation

About this


Proteogenomics, a field presently dealing primarily with the tailoring of individualized protein sequence databases, via utilization of various sources of genetic sequencing and expression information, for optimal success in mass spectrometry-based protein discovery and quantification. Such methods of customized database generation are of particular interest to oncology-related fields, where novel peptides deriving from, say, reactivated viral retrotransposons or immunogenic neoantigens could serve, respectively, as cancer-specific oncogenic drivers or targets for precision therapy.

Our newly developed tool is, to our knowledge, the most sophisticated transcriptome assembly-based proteogenomic method to date, utilizing a hybrid (de novo/annotation-guided) approach that integrates sample-specific WGS variants and accounts, optimistically, for potentially massive somatic allelic heterogenicity, while minimizing increase in false discovery rate. The proteome predicted in this manner is then digested with trypsin in silico, generating the search space for MS-based analysis, with resulting matched peptides then annotated and remapped to their originating transcripts for easy visual analysis and validation. Our method runs entirely on Unix-based systems, is optimized for highly parallelized execution (as on a high performance cluster), and emphasizes usability (facilitated by the pipeline manager Snakemake).

Ask The


Ask the scientists

What are the goals of this project?

A specific ongoing aim of ours concerns expressed PGBD5, whose transposase activity our lab has recently proven (PGBD5 promotes site-specific oncogenic mutations in human tumors; Henssen et. al., Nat Genet. 2017). Finding that, amongst rhabdoid tumors, PGBD5-associated ‘signal sequences’ appeared recurrently near structural variant breakpoints within SMARCB1, we are curious how prevalently this occurs in a larger dataset. More generally, we wish to characterize the proteome-wide signature of rhabdoid (and other pediatric brain) tumors to drive further hypotheses of oncogenesis (for instance, reactivation of retroviral ORFs).

Specimen Data

The Children's Brain Tumor Network contributed to this project by providing access to the Pediatric Brain Tumor Atlas.

Explore the data in these informatics portals