Relevance Lab and AWS collaborate to release new EC2-RStudio-Server solution for Service Workbench

June 24, 2021

Introduction

Provide researchers access to secure RStudio instances in the AWS cloud by using Amazon issued certificates in AWS Certificate Manager (ACM) and an Application Load Balancer (ALB)

Cloud computing offers the research community access to vast amounts of computational power, storage, specialized data tools, and public data sets, collectively referred to as Research IT, with the added benefit of paying only for what is used. However, researchers may not be experts in using the AWS Console to provision these services in the right way. This is where software solutions like Service Workbench on AWS (SWB) make it possible to deliver scientific research computing resources in a secure and easily accessible manner.

RStudio is a popular software used by the Scientific Research Community and supported by Service Workbench. Researchers use RStudio very commonly in their day-to-day efforts. While RStudio is a popular product, the process of installing RStudio securely on AWS Cloud and using it in a cost-effective manner is a non-trivial task, especially for Researchers. With SWB, the goal is to make this process very simple, secure, and cost-effective for Researchers so that they can focus on “Science” and not “Servers” thereby increasing their productivity.

Relevance Lab (RL), in partnership with AWS, set out to make the experience of using RStudio with Service Workbench on AWS simple and secure.

Technical Solution Goals

  1. A researcher should be able to launch an RStudio instance in the AWS cloud from within the Service Workbench portal.
  2. The RStudio instance comes fully loaded with the latest version of RStudio and a variety of other software packages that help in scientific research computing.
  3. The user launches a URL to the RStudio from within the Service Workbench. This URL is a unique URL generated by SWB and is encoded with an authentication token that ensures that the researcher can access the RStudio instance without remembering any passwords. The URL is served over SSL so that all communications can be encrypted in transit.
  4. Maintaining the certificates used for SSL communication should be cost-effective and should not require excessive administrative efforts.
  5. The solution should provide isolation of researcher-specific instances using allowed IP lists controlled by the end-user.

Comparison of Old and New Design Principles to make Researcher Experience Frictionless


The following section summarizes the old design and the new architecture to make the entire researcher experience frictionless. Based on feedback from researchers, it was felt that the older design required a lot of setup complexity and lifecycle upgrades for security certificate management, slowing down researchers productivity. The new solution makes the lifecycle simple and frictionless along with smart and innovative features to keep ongoing costs optimized.

The diagram below explains the interplay between different design components.

Secure and Scalable Solution Architecture

Keeping in mind the above design goals, a secure and scalable architecture is implemented that solves the problem of shared groups using products like RStudio requiring secure HTTPS access without the overheads of individual certificate management. The architecture also enables sharing the same concept for all future researcher products with similar needs without any additional implementation overheads resulting in increased productivity and lower costs.

The Relevance Lab team designed a solution centered on an EC2 Linux instance with RStudio and relevant packages pre-installed and delivered as an AMI.

  1. When the instance is provisioned, it is brought up without a public IP address.
  2. All traffic to this instance is delivered via an Application Load Balancer (ALB). The ALB is shared across multiple RStudio instances within the same account to spread the cost over a larger number of users.
  3. The ALB serves over an SSL link secured with an Amazon-issued certificate which is maintained by AWS Certificate Manager.
  4. The ALB costs are further brought down by provisioning it on demand when the first RStudio instance is provisioned. Conversely, the ALB is de-provisioned when the last RStudio instance is de-provisioned.
  5. Traffic between the ALB and the RStudio instance is also secured with an SSL certificate which is self-signed but unique to each instance.
  6. The ALB listener rules enforce the IP allowed list configured by the user.

Conclusion

Both SWB and Relevance Lab RLCatalyst Research Gateway teams are committed to making scientific research frictionless for researchers. With a shared goal, this new initiative speeds up collaboration and will help provide new innovative open-source solutions leveraging Service Workbench on AWS and partner-provided solutions like this RStudio with ALB from Relevance Lab. The collaboration efforts will soon be adding more solutions covering Genomic Pipeline Orchestration with Nextflow, use of HPC Parallel Cluster, and secure research workspaces with AppStream 2.0, so stay tuned.

Tags
RStudio
scientific research
AWS solutions