Introduction
Developed in the Data Sciences Platform at the Broad Institute, the Genome Analysis Toolkit (GATK) offers a wide variety of tools with a primary focus on variant discovery and genotyping. Relevance Lab is pleased to offer researchers the ability to run their GATK pipelines on AWS that was missing so far with our Genomics Cloud solution and a 1-click model.
GATK is making scientific research simpler for Genomics by providing best practices workflows and docker containers. The workflows are written in Workflow Description Language (WDL), a user-friendly scripting language maintained by the OpenWDL community. Cromwell is an open-source workflow execution engine that supports WDL as well as CWL, the Common Workflow Language, and can be run on a variety of different platforms, both local and cloud-based. RLCatalyst Research Gateway added support for the Cromwell engine that enables researchers to run any popular workflows on AWS seamlessly. Some of the popular workflows that are available for a quick start are the following:
- Mitochondrial short variant discovery (SNVs + Indels)
- Somatic short variant discovery (SNVs + Indels)
- Somatic copy number variant discovery (CNVs)
- Germline short variant discovery (SNPs + Indels)
- Germline copy number variant discovery (CNVs)
- RNAseq short variant discovery (SNPs + Indels)
The figure below shows the building block of this solution on AWS Cloud.
Steps for running GATK with WDL and Cromwell on AWS Cloud
The figure below shows the ability to select Cromwell Advanced to provision and run any pipeline.
The following picture shows the architecture of Cromwell on AWS.
Summary
GATK community is constantly striving to make Genomics Research in the cloud simpler. So far, the support for AWS Cloud was still missing and was a key ask from multiple online research communities. Relevance Lab, in partnership with AWS, has addressed this need with their Genomics Cloud solution to make scientific research frictionless.