Table of Contents |
---|
Overview
The purpose of this page is to provide high-level guidance one what should be contained in a Read Replica training module. The goal of preparing this material should be to present it to JnJ digitally and with in-person training. After it has been digested by their team, we should refine and package it in a way that can be shared with others by way of our knowledge transfer tools. In addition to the content shared here, we also need to share the full data model in a highly consumable fashion (ie the HTML frame-set generated by ClinSpark builds).
Read Replicas
Show the logical architecture of a ClinSpark deployment. Highlight the replicas and one reserved for use by customers. Note encryption.
Use Cases
ClinSpark has many reports, dashboards and export options available for customers. These have been built to answer a wide array of the most common shared use cases for ClinSpark customers.
With the Read Replica it's possible to have your users create custom reports, queries or visual analysis. Examples of customer use cases for this data include
Custom Reports
Ad Hoc Queries
Advanced Business Intelligence Dashboards
Latency
Describe how AWS RDS populates a replica and what the expected lag time is between master and replica.
Connecting to your Replica
PrivateLink Connection. We would set this up for you during onboarding.
<diagram showing user computer, ssh tunnel
Share details on what an SSH tunnel is, and how it can be leveraged by customers in order to connect business intelligence toolsClinSpark customers have available to them a live read-only copy of their production ClinSpark database called a Read Replica. This document will describe what a Read Replica is from a technical perspective. It will provide descriptions of key aspects of the underlying Data Model so that technical users may better understand how the data is stored. And it will explore various use cases for leveraging the Read Replica as well as some examples.
Read Replicas
All ClinSpark data is stored within a MySQL compatible relational database within the AWS Cloud. The Master database is the production database for the ClinSpark instance. This is where all data is written to and updated during the use of the ClinSpark application. In addition to other live backup mechanisms for operational use, a special read-only copy of this production data can be made available to customers. This database is a dedicated copy of the production database, solely for the purpose of customer usage. It is not used by running ClinSpark instances in any way. It is read-only, meaning that it does not accept writes, and it is not possible for any usage of this database to impact the Master database in any way.
Here is a visual depiction of the Read Replica in the AWS environment:
Note that like all ClinSpark application assets, it lives within private subnets of the Foundry Health Production Virtual Private Cloud (VPC). It is not exposed to the internet. It's data is encrypted at rest, and in transit via SSH.
Latency: How old is the data in the Replica?
As updates are applied to Master, the Replica receives these updates in real time. There is a delay, but due to the architecture of AWS RDS Aurora, this delay is typically no greater than 20 ms. So 20 milliseconds after a change to the production database is made, the customer's Read Replica has this update available.
Connecting to your Replica
There are a variety of ways to access the Read Replica. For customers with established relationships with AWS and special connections between AWS and their site's data centers, a variety of approaches exist for providing more direct connections to the replica.
However for most customers, the most common way to access the Read Replica is via SSH. For these customers, we provision a special bastion host which has no purpose other than to provide this customer with access to this replica. This host is accessible only via SSH, and only using the customer-owned private SSH key. This host has no access to the environment other than to access this replica.
During onboarding, 2 DNS names and one set of credentials will be provided to you. One is the DNS name of your dedicated bastion host to access the replica via SSH. And the other is the DNS name of the read replica itself with the AWS VPC. For this example, let's pretend these DNS name are as follows:
Customer Bastion Host | customer-replica-bastion.clinspark.com |
---|---|
Customer Read Replica (private DNS) | customer-replica.crs8xf7ezw7g.us-east-1.rds.amazonaws.com |
Read Replica Username | <username> |
Read Replica Password | <password> |
To connect to his replica via the command line, the steps are as follows:
- Create a SSH key you will use to access the DB through the bastion. If you already have one that's fine. Here are instructions for doing this and also adding the key to the ssh-agent.
- Provide Foundry Health with the public SSH keys of users who need access to this database. We will place this on the bastion, allowing these users to tunnel into the replica. Open a service ticket with the public key attached and we'll get this installed quickly.
- Verify you have access by connecting to the bastion and replica as follows from the command line:
__| __|_ ) _| ( / Amazon Linux AMI ___|\___|___| [ec2-user @ip - 172 - 31 - 17 - 113 ~]$ mysql -u <username> -h customer -replica.crs8xf7ezw7g.us-east- 1 .rds.amazonaws.com -p<password> Warning: Using a password on the command line interface can be insecure. Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 262 Server version: 5.6 . 22 MySQL Community Server (GPL) Copyright (c) 2000 , 2016 , Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> |
The above shows a terminal session where the user has first added the private ssh key to the ssh-agent as described in the previous link. The user connects to the bastion, and from there is able to use the mysql client to interact with the replica directly from the bastion.
You'll probably want to establish a SSH tunnel between your local workstation and the replica through the bastion. This is a common usage pattern, see your SSH client of choice documentation for instructions on how to configure this.
Info |
---|
The above approach is appropriate if only one or two trusted users will access the bastion. It is secure as long as the private keys are secured. Customers with more users will want to find more scalable ways to access the replica data. One could be a SSH persistent tunnel on the customer site, and used by other users at the site. AWS has a variety of options for this. See AWS VPNs and Direct Connect, which are two mechanisms to securely connect your sites to your own AWS account. From there a variety of options exist, including AWS PrivateLink to create secure connections between AWS accounts. It is the customer's responsibility to select and configure any options such as the above. Foundry Health will assist by suggesting approaches and making required configurations on our end. But the majority of the setup is in customer-owned infrastructure, and for this the customer is solely responsible. |
Clinical Data Interchange Standards Consortium Standard (CDISC)
...
CDISC ODM defines a pattern for modeling the definition of a clinical data element and the data captured. This pattern has one element describe the definition, and another element record the data for a particular clinical collection of this element type. ClinSpark adheres to this pattern, though as noted in the table below, the suffix "Def" is removed from the definition elements. Here are the examples
Data Definition Element | Data Capture Element | Notes |
---|---|---|
Item (ItemDef in CDISC) | ItemData | Represent the definition of a piece of data collected |
ItemGroup (ItemGroupDef in CDISC) | ItemGroupData | Aggregates item data |
Form (FormDef in CDISC) | FormData | Basically a container for item groups |
StudyEvent (StudyEventDef in CDISC) | StudyEventData | Clinical data for a study event (visit) for a given subject |
With this in mind, below is a view of how these elements relate to each other.
...