We previously developed rMATS, a robust statistical framework designed for quantifying alternative splicing and identifying differential splicing events in RNA-seq data from many biological samples (Shen et al., PNAS, 2014, 111(51): E5593–E5601). Since its release, the rMATS software has received over 25,000 downloads. To facilitate alternative splicing analysis in very large-scale RNA-seq datasets (e.g. TCGA and CBTN data, with hundreds to thousands of samples per cancer type), we completely re-designed the RNA-seq data processing pipeline in rMATS as well as its associated data standards and library of functions to more efficiently capture, store, and exchange splicing information in raw RNA-seq data. Moreover, as rMATS-turbo allows parallel processing of multiple RNA-seq samples (a feature unavailable in the current rMATS), we can use multiple nodes on a computing cluster to process thousands of samples efficiently. Collectively, rMATS-turbo provides unprecedented efficiency and precision for splicing analysis in massive RNA-seq datasets.
What are the goals of this project?
The long-term objective of this project is to use the CBTN data to study alternative splicing in pediatric brain tumor transcriptomes, with a major goal of identifying novel biomarkers and therapeutic targets of cancer. We also plan to use the CBTN data to develop new computational and statistical methods for quantifying alternative splicing events and inferring splicing regulatory networks from large-scale RNA-seq datasets.
What is the impact of this project?
This project will lead to a better understanding of the cancer genomes and transcriptomes, and could reveal novel cancer biomarkers and molecular targets for therapeutic development.
The Children's Brain Tumor network contributed to this project by providing access to the Pediatric Brain Tumor Atlas.