Research Data Platform Enabling Scientific Research in Cloud with AWS Open-Source Solution

July 4, 2023

Introduction

Research computing is a growing need and AWS cloud enables researchers to process big data with scalable computing in a secure and flexible manner. While Cloud computing is a powerful platform it also brings complexity with new tools, nomenclature and multiple options that distract researchers. Relevance Lab is partnering with AWS Public sector group and some leading US universities to create a frictionless “Research Data Platform (RDP)” leveraging open-source solutions.

Service Workbench from AWS is a powerful open-source solution for enabling research in cloud. Customers around the globe are already using this solution for common use cases.

  • Enable researchers to use AWS Cloud with Self-service capabilities and common catalog of tools like EC2, SageMaker, S3, Studies data etc.
  • Use common Data Analysis tools like RStudio in a secure and scalable manner.
  • Setup a “Trusted Research Environment” in cloud for research with additional controls that enforce Ingress/Egress data restrictions for compliance.

While Service Workbench provides a good foundation platform for research, it also had some challenges based on feedback from early adopters mainly related to following:

  • Complex setup requiring deep cloud know-how.
  • An Admin centric User Experience not very Researcher friendly.
  • Scalability challenges while adopting large scale research setups.
  • Hard to customize.
  • No enterprise support models available to guide customer through a Plan-Build-Run lifecycle.

Relevance Lab has built a modern and researcher friendly User Experience solution called “Research Data Platform” in collaboration with AWS and its early adopters extending the open-source foundation.

Key Functionalities of Research Data Platform

The primary goal is to drive frictionless research in cloud with following key features:

  • Built as an open-source solution and made available to institutions interested in collaborating on a common Data Science Platform for research.
  • “Project Centric” model enabling collaboration of researchers with common data, tools, and research goals in a self-service manner.
  • Modern architecture with support for containers enabling researchers to bring their own tools covering Web-based software, Desktop-based tools, and Terminal-based solutions seamlessly accessed from Researcher Data Platform.
  • Enable researchers to launch applications and choose configurations without knowledge of Cloud Infrastructure details for both regular and GPU workloads.
  • Integrate with Datasets for research that are project centric and with a browser based easy interface to upload/download data for research.
  • Ability to run multiple research projects across different AWS accounts with secure and scalable setup and guardrails.

The key functions flows needed for a Researcher are explained in the figure below:

Here is link for a demo of the solution.

Solution Architecture of Research Data Platform

The building blocks for the solution leverage the Service Workbench functionality and creates a separate Researcher Data Platform (RDP) layer for providing a UI driven application to Researchers roles and Admin users. The figure below captures the building blocks for this solution.

The solution consists of the following components:

  • Webserver that serves the UI for the platform. The UI provides the entire researcher user experience whereby users can log in with their credentials and access the projects made available to them. Within the projects, users can launch applications that have been configured for them by the administrator. Users can choose the required configuration of the instances based on configurations created by the administrator.
  • Research Data Platform DB. This database stores some of the configuration information and the mapping information required to faciliate the use of the underlying “Service Workbench” open-source software.
  • Research Data Platform CLI. This command line interface allows the administrator to set up and configure projects, users, datasets, launchers and configurations easily.
  • Service Workbench. This open-source software from AWS is the underlying API-driven engine that orchestrates and manages all the AWS resources on behalf of the user.

Deployment Architecture of Research Data Platform

The solution is deployed in an enterprise model for each customer in their AWS accounts and recommends the following architecture based on AWS Well Architected Framework as explained in figure below.

The deployment of the Research Data Platform consists of the following:

  • One “Main” AWS account where RDP is deployed along with the Service workbench from AWS.
  • Within the main account, Service Workbench is deployed as a serverless solution driven by APIs. It stores data in a DynamoDB database and uses AWS Service Catalog to manage and orchestrate resources. It uses Amazon S3 to create buckets that hold data.
  • Within the main account, the Research Data Platform is deployed as a web server that serves the UI, along with an API backend that communicates with the Service Workbench.
  • One or more project accounts are onboarded and can be used to create projects and access datasets

Sample Screens for Research Data Platform

The key functionality for the solution is explained in some sample screens below.

Home Page: This is the first page that the user visits. From this page the user can choose to login to the Research Data Platform.

Projects Page: The projects page displays a card view of all the projects that the logged-in user is assigned to. Projects are set up by the administrator.

Each application that is useful to a researcher is set up as a launcher. Each launcher appears on the project workbench page as a card and the researcher can instantiate a session by clicking on the launcher card.

Files tab: This screen allows the researcher to browse the files in the datasets that are assigned to the project. A default storage area called project storage is available in every project. The project storage can also be browsed from this screen.

Launch Dialog: The user can select a configuration that is suitable for their research.

Project Details: The user can connect to Active sessions from the Workbench tab.

Sessions: An instance of a launcher is called a session. A user can connect to a session via the browser to access the application they need for conducting their research and analysis.

How Can New Customers Get Started?

  • Contact us for a quick discussion and demonstration of the standard solution
  • We will capture an assessment of standard features vs know gaps for adopting the solution
  • Engage on a Plan-Build-Run model based on deployment, enablement and operational readiness to start using Research in AWS cloud with simple and secure best practices
  • Customers with standard needs can get started with a new setup in 8-10 weeks
  • Relevance Lab will also provide on-going support and managed services

Conclusion

The Research Data Platform offers a comprehensive and researcher-friendly solution. It empowers researchers to process big data, perform data analysis, and conduct research efficiently in a secure and scalable manner. By bridging the gap between researchers and the AWS cloud, the RDP fosters innovation and advances scientific discovery in diverse domains.

Tags
Research Data Platform
research portals
Research Gateway