Intel and Broad Co-Operate To Enhance BioMedical Research

By CIOReview | Wednesday, April 13, 2016
691
1206
224

CAMBRIDGE, MA: Genomic data have risen into large scale genomic sequencing data and it need improvements in the scalability, storage, and processing to advance the level of research, particularly in the field of biomedical. A team of Broad Institute of MIT & Harvard and Intel developing new tools and advancing fundamental capabilities to run large genomic workflows at cloud scale.

Broad is working with Intel to extend its workflow execution engine—Cromwell’s—capabilities to support multiple input languages while execution on multiple back ends simultaneously, enabling researchers to run jobs in real-time. Broad has also denoted collaboration with the cloud suppliers to enable cloud-based access to its Genome Analysis Toolkit (GATK) that is expected to amplify access to the GATK Best Practices pipeline. The new tools speed up variant detection and biomarker discovery and enable discoveries that would not have been detected with smaller cohorts.

Declining in the cost of DNA has driven complex genomic data stored as text files are progressively difficult for researchers to jointly analyze the sequence. It needs next-generation databases that are built and optimized for genomic data. GenomicsDB stores vast amounts of patient variant data and runs on an array database system optimized for sparse data called “TileDB.” TileDB was developed by Intel and MIT researchers working at the Intel Science and Technology Center for Big Data, which is based at MIT's Artificial Intelligence and Computer Science Lab. GenomicsDB is now used in the Broad’s production pipeline running on an Intel Xeon processor-based cloud environment to perform joint genotyping.

Dr. Eric Banks, Senior Dir. of Data Sciences and Data Engineering, Broad said that the time it now requires to perform the variant discovery process, cut down from eight days to eighteen hours. However, the quantity was with 100 whole genomes. Banks adds that it would take 90 days to run variant discovery on an eight thousand sample project. With GenomicsDB, the process will take a week, which means the researchers can believe the output of their project results are executed in quick time.