Cavatica is a cloud-based portal environment developed to securely store, share and analyze large volumes of pediatric brain tumor genomic data to accelerate collaboration in research. Named for the popular children’s story Charlotte’s Web, Cavatica allows researchers and investigators to access and share a network of data, pipelines, algorithms, visualizations, and hypotheses’ about specific types of tumors. Cavatica includes data from a number of sources including the Children’s Brain Tumor Tissue Consortium (CBTN), Pacific Neuro-oncology Consortium (PNOC), Stand Up to Cancer, TARGET and TCGA.
Cavatica is improving collaboration between data scientists, statisticians, data engineers, programmers, application developers, bioinformaticians and scientists. Following its launch in October 2016, Cavatica has grown to become the largest clinically annotated pediatric cancer database on earth.
The PedcBioPortal is an open-access resource for childhood cancer genomics which enables users to visualize, analyze and also download large-scale cancer genomics data sets. These data allow researchers to understand the molecular mechanisms of cancer and design therapies based on a patient's unique profile.
PedcBioPortal is a variation upon the original cBioPortal (developed at Memorial Sloan Kettering) which allows investigators and researchers to rapidly explore data stored in the adult-focused TCGA database. Although this adult data is accessible through PedcBioPortal, the platform focuses on analysis of high-quality childhood cancer datasets. This platform works uniquely within the CBTN applications ecosystem to empower research and the translation of genomics data into biological insights and improved clinical therapies.
Kids First Data Resource Portal
The Gabriella Miller Kids First Data Resource Portal provides access to more than 8,000 samples of childhood cancer and structural birth defects genomic data. The Kids First Data Resource Portal stores the CBTN's Pediatric Brain Tumor Atlas data and allows cross-analysis of different disease types to uncover the potential links between genetic diseases occurring in children. Kids First DRC Git Hub
About the CBTN Informatics Infrastructure
A foundational mission of the CBTN is to protect subject, participant and family privacy. Subjects’ privacy is assured through the use of an electronic Honest Broker (eHB), a platform which de-identifies subject information while regulating how the data is distributed. The Biorepository Toolkit Project, another open source software development project, provides a single unified interface for data collection and analysis.
All subject identifiers are kept in a secure and encrypted database which is only accessible to the operations team and the honest broker software. This information cannot be retrieved unless a request by a credentialed user of the subject’s home consortium site is made. All information complies with HIPAA policies and personal identification disclosure guidelines. Once the data is de-identified, it is accessible through the Biorepository Portal (BRP) to allow sites around the globe to view and analyze subject data or request additional specimens.
The CBTN Biorepository Portal (BRP), developed by the Enterprise Informatics Group (EiG) of CHOP’s Department of Biomedical and Health Informatics (DBHi) ensures that data is captured and integrated securely to ensure each patient's privacy throughout the specimen management process. The Data and Specimen Inventory Tool used to query and request de-identified data/specimens and promote quality control and scientific discovery is built on the Harvest platform. Other tools used and integrated into the CBTN informatics infrastructure include the Electronic Honest Broker (eHB), REDCap data capture, and Nautilus laboratory information management system (LIMS).
CBTN Data Quality Assurance
Because data is always flowing into the CBTN, quality control must also be constant. The CBTN's data quality is maintained through an online portal that research coordinators use to track and solve data quality issues such as entry errors, omissions, formatting errors, and inconsistencies in data across clinical and specimen datasets. On a nightly basis, the entire dataset is checked and tasks are made available for research coordinators to view and solve. This system also allows the CBTN to track quality and overall metrics over time to ensure our accuracy is maintained as the CBTN's collection of childhood brain tumor data grows and expands.
CBTN Software Toolkit
Electronic Honest Broker: A secure, non-user facing software service that works behind the scenes to connect the BRP and other laboratory, clinical and genomics software tools. This tool securely provides a solution to the complex process of protecting participant privacy while maintaining highly complex specimen and clinical data. More information can be found in a paper published in the August 2016 Special Issue of BMC Genomics.
RedCap: A secure data capture and management tool developed by Vanderbilt University
Nautilus Laboratory Information Management System (LIMS): Used to capture and track clinical and translational research data, developed in partnership with ThermoFisher Scientific
Amazon Simple Storage Service (S3 Buckets): Used to store the CBTN's genomic data as well as other data file types.
Seven Bridges Genomics: Genomics processing and support for the Cavatica platform