Complex Genomics Analysis Pipelines made Simple with NextFlow & Research Gateway integrated with Cost Tracking and Security

March 10, 2022

Introduction

As a researcher, do you want to get started in minutes to run any complex genomics pipeline with large data sets without worrying about hours to set up the environment, dealing with large data sets availability & storage, security of your cloud infrastructure, and most of all unknown expenses? RL Catalyst makes your life simpler, and in this blog, we will cover how easy it is to use publicly available Genomics pipelines from nf-co.re using Nextflow on your AWS Cloud environment with ease.

There are a number of open-source tools available for researchers driving re-use. However, what Research Institutions and Genomics companies are looking for is the right balance on three key dimensions before adopting cloud in a large scale manner for internal use:

Three Key Dimensions

Cost and Budget Governance

Strong focus on Cost Tracking of Cloud resources to track, analyze, control, and optimize budget spends.

Research Data & Tools Easy Collaboration

Principal Investigators and researchers need to focus on data management, governance, and privacy along with analysis and collaboration in real-time without worrying about Cloud complexity.

Security and Compliance

Research requires a strong focus on security and compliance covering Identity management, data privacy, audit trails, encryption, and access management.

To make sure the above functionalities do not slow down researchers from focusing on Science due to the complexities of infrastructure, Research Gateway provides a reliable solution by automating cost & budget tracking with safe-guards and providing a simple self-service model for collaboration. We will demonstrate in this blog how researchers can use a vast set of publicly available tools, pipelines and data easily on this platform with tight budget controls. Here is a quick video of the ease with which researchers can get started in a frictionless manner.

nf-co.re is a community effort to collect a curated set of analysis pipelines built using Nextflow. The key aspects of these pipelines are that these pipelines adhere to strict guidelines that ensure they can be reused extensively.

Advantages of these pipelines

Cloud-Ready

Pipelines are tested on AWS after every release. You can even browse results live on the website and use outputs for your own benchmarking.

Portable and reproducible

Pipelines follow best practices to ensure maximum portability and reproducibility. The large community makes the pipelines exceptionally well tested and easy to run.

Packaged software

Pipeline dependencies are automatically downloaded and handled using Docker, Singularity, Conda, or others. No need for any software installations.

Stable releases

nf-core pipelines use GitHub releases to tag stable versions of the code and software, making pipeline runs totally reproducible.

CI testing

Every time a change is made to the pipeline code, nf-core pipelines use continuous integration testing to ensure that nothing has broken.

Documentation

Extensive documentation covering installation, usage, and description of output files ensures that you won't be left in the dark.

Sample of commonly used pipelines that are supported out-of-box in Research Gateway to run with a few clicks and do important genomic analysis. While publicly available repos are easily accessible, it also allows private repositories and custom pipelines to run with ease.

*The above samples can be launched in less than 5 min and take less than $5 to run with test data and 80% productivity gains achieved.

The figure below shows the building block of this solution on AWS Cloud.

Steps for running nf-core pipeline with Nextflow on AWS Cloud

The figure below shows the Nextflow Architecture on AWS.

Summary

nf-co.re community is constantly striving to make Genomics Research in the Cloud simpler. While these pipelines are easily available, running them on AWS Cloud with proper cost tracking, collaboration, data management, and integrated workbench were missing that is now solved by Research Gateway.  Relevance Lab, in partnership with AWS, has addressed this need with their Genomics Cloud solution to make scientific research frictionless.

Tags
RLCatalyst
Nextflow
data management
AWS solutions