Overview

ClinSpark customers have available to them a live read-only copy of their production ClinSpark database called a Read Replica (RR). This document will describe what a Read Replica is from a technical perspective. It will provide descriptions of key aspects of the underlying Data Model so that technical users may better understand how the data is stored. And it will explore various use cases for leveraging the Read Replica as well as some examples.

Users that can access the RR can see everything including data items blinded in the User Interface, with the exception of Javascript Edit Checks and Methods (which are currently encrypted).

It is possible to implement different security for special database accounts, but that would require engineering work and cost and be non-standard.

Read Replicas

All ClinSpark data is stored within a MySQL compatible relational database within the AWS Cloud. The Master database is the production database for the ClinSpark instance. This is where all data is written to and updated during the use of the ClinSpark application. In addition to other live backup mechanisms for operational use, a special read-only copy of this production data can be made available to customers. This database is a dedicated copy of the production database, solely for the purpose of customer usage. It is not used by running ClinSpark instances in any way. It is read-only, meaning that it does not accept writes, and it is not possible for any usage of this database to impact the Master database in any way.

ClinSpark Read Replicas are AWS RDS Aurora Cross-Region Read Replicas. You can find comprehensive documentation from AWS online on these replicas.

Here is a visual depiction of the Read Replica in the ClinSpark AWS production environment:

Secure Access to the Replica

Read Replicas are hosted in subnets which only allow traffic from explicitly whitelisted static public IPs. Customers must request that their organizations static public IP addresses be whitelisted in order to enable access to their replica. This can request can be made through a Service Desk ticket, and questions brought to our engineering team on this topic as needed.

If customers are unsure of their current public IP, they should reach out to their site network/IT administrator to clarify. Alternately, customers can review this help article to learn more about IP addresses and use a ‘Find your IP address’ feature to check their current public IP address.

Static & Dynamic IPs

A dynamic IP address is one that changes regularly. Most home-based internet service providers will offer their customers dynamic public IP addresses, as it is often the most cost-effective and flexible for the ISP to allocate. However this poses a security risk from the perspective replica access for home-based users, as whoever is later assigned that dynamic IP will have access to the database.

We therefore do not whitelist individual user home IPs due to their dynamic nature. We only support static IPs belonging to a site organization. Questions about this can be raised to our engineering team via service desk as needed.

Private IP Ranges

Please note that with access requests, private network IP ranges cannot be whitelisted. Private IP addresses may be assigned to computers that access internal company resources via VPN, or, physically connect to networks on site where a private IP is assigned.

Customers can verify if their assigned computer IP address falls within private address space by checking their active IP address prefix. Any IPs on the 10.8.XXX.XXX, 192.168.XXX.XXX or 172.16.XXX.XXX address ranges are private IPs and cannot be whitelisted.

Read Only User

A read-only user will be provided along with connection credentials. This user has full query access to the ClinSpark production database.

Connectivity Option: SSH Jump Host

To date nearly all customers have been pleased with the security offered by narrowly IP-whitelisted firewall protections for their Read Replica plus SSL. Some customers however would prefer to have the Read Replica further restricted and use a dedicated SSH jump host to access the replica. This is an option which is more complicated to setup on both the IQVIA and customer side. On the customer side typically an infrastructure SSH tunnel is established between the customer network and the ssh server dedicated to this customer’s replica access. Tunneling is then used to access the replica.

Please request this via a ticket if this is of interest to you. There will likely be extra charges. Note also that we provide no support for your use of this tunnel, which is relatively complicated vs the whitelisting approach.

Database Schema

The database schema for the Read Replica is the same as the schema for ClinSpark itself. This schema evolves somewhat with each ClinSpark release.

You can find the schema DDL for various ClinSpark releases elsewhere on this document (search for “Schema DDL”).

Read Replica Use Cases

There are a wide variety of usage patterns for customer Read Replicas. This is your data, so use it as your business needs require. The following are a few common patterns presented as examples.

Integration to your Existing Data Warehouse

Customers who have existing data warehouses or datamarts may choose to integrate ClinSpark data into these repositories. Typically this involves the customer setting up an ELT job to periodically pull data from the replica into their warehouse. This is not something that Foundry Health supports directly. However it is an expected use case.

Ad Hoc SQL Queries

Nearly all SQL tools connect to MySQL databases. So whatever SQL tool you typically use should work with very little configuration. Use the tool that is most familiar to you. If you do not have a tool preference, one option may be the free MySQL Workbench, which is a fairly full featured tool. Using a SQL tool like this is useful for creating queries to answer questions on the fly. It can also be used to generate simple reports, perform data modeling (the above diagrams were created with this tool), etc.

The connection instructions above show how to connect MySQL Workbench to your replica. And there is extensive help online and in the app for using the tool from there.

Business Intelligence Tools

The ClinSpark Read Replica can be used with any Business Intelligence (BI) tool which operates on relational data. BI tools are very popular these days, and there are a wide variety of vendors.

Here is an example of one way that the Read Replica can be connected to a customer-hosted BI tool called Tableau:

Tableau

As shown above, it is easy to connect to Read Replica data using Tableau using a SSH tunnel to a local workstation or gateway. The standard Tableau Desktop connection wizard will guide you through the steps to connect from there. Please contact a Tableau representative for more details. They provide consulting services and training.

Here is a presentation on using Tableau with the Read Replica from the 2017 ClinSpark User Group conference.

Tibco Spotfire

Tibco Spotfire can access the Read Replica in the same manner as Tableau.

AWS Quicksight

Customers can also use AWS Quicksite. This is a far cheaper BI tool than Tableau or Spotfire, but it may be adequate depending on your use cases.

Crystal Reports or Similar

Crystal Reports or other similar products all can operate on MySQL databases. As such they can connect to and use the Read Replica. It is possible to create customer-specific reports using these tools.

Requesting a Read Replica

Any customer may request a Read Replica. This request should be made via a JIRA service desk ticket. In the ticket please specify the IP address which must be able to access this replica. Typically this is the IP address or range of the site where analysis will be done. The replicate will only be accessible from IP addresses specified in the ticket.

Please note that a fee is associated with the provision and support for Read Replicas.