The cBioPortal for Cancer Genomics: an open source platform for accessing and interpreting complex cancer genomics data in the era of precision medicine

Published in
on
Gao, Jianjiong, Ersin Ciftci, Pichai Raman, Pieter Lukasse, Istemi Bahceci, Adam Abeshouse, Hsiao-Wei Chen, Ino de Bruijn, Benjamin Gross, Zachary Heins, Ritika Kundra, Aaron Lisman, Angelica Ochoa, Robert Sheridan, Onur Sumer, Yichao Sun, Jiaojiao Wang, Manda Wilson, Hongxin Zhang, James Xu, Andy Dufilie, Priti Kumari, James Lindsay, Anthony Cros, Karthik Kalletla, Fedde Schaeffer, Sander Tan, Sjoerd van Hagen, Jorge Reis-Filho, Kees van Bochove, Ugur Dogrusoz, Trevor Pugh, Adam Resnick, Chris Sander, Ethan Cerami, and Nikolaus Schultz
cancer research.jpg

Abstract

The cBioPortal for Cancer Genomics is an open-access portal (http://cbioportal.org) that enables interactive, exploratory analysis of large-scale cancer genomics data. It integrates genomic and clinical data, and provides a suite of visualization and analysis options, including cohort and patient-level visualization, mutation visualization, survival analysis, enrichment analysis, and network analysis. The user interface is user-friendly, responsive, and makes genomic data easily accessible to translational scientists, biologists, and clinicians.

The cBioPortal is a fully open source platform. All code is available on GitHub (https://github.com/cBioPortal/) under GNU Affero GPL license. The code base is maintained by multiple groups, including Memorial Sloan Kettering Cancer Center, Dana-Farber Cancer Institute, Children’s Hospital of Philadelphia, Princess Margaret Cancer Centre, and The Hyve, an open source bioinformatics company based in the Netherlands. More than 30 academic centers as well as multiple pharmaceutical and biotech companies maintain private instances of the cBioPortal. This includes the recently launched cBioPortal instance at the NCI Genomic Data Commons (https://cbioportal.gdc.nci.nih.gov/), and two large cBioPortal instances hosting genomic and clinical data at MSK and DFCI, supporting the MSK-IMPACT and DFCI Profile projects, two of the largest clinical sequencing efforts in the world.

Our multi-institutional software team has accelerated the progress of evolving the core architectural technologies and developing new features to keep pace with the rapidly advancing fields of cancer genomics and precision cancer medicine. For example, we have integrated multi-platform genomics data with extensive clinical data including patient demographics, treatment history, and survival data. We have also developed a patient-centric view that visualizes both clinical and genomic data with annotation from OncoKB knowledge base. In the next few years, the development team will focus on the following areas:

(1) Implementing major architectural changes to ensure future scalability and performance.

(2) New features to support precision medicine, including (i) improved integration of knowledge base annotation, (ii) enhanced visualization of patient timeline, drug response, and tumor evolution, (iii) new patient similarity metrics, (iv) improved support for immunogenomics and immunotherapy, and (v) new visualization and analysis features for understanding response to therapy.

(3) New analysis and target discovery features for large cohorts, including (i) supporting user-defined virtual cohort by selecting samples from multiple studies, and (ii) comparison of genomic or clinical characteristics of two or more selected cohorts.

(4) Expanding community outreach, user support and training, and documentation.