© 2024 IQVIA - All Rights Reserved
Read Replica
Summary
ClinSpark customers have available to them a live read-only copy of their production ClinSpark database called a Read Replica (RR). This document will describe what a Read Replica is from a technical perspective. It will provide descriptions of key aspects of the underlying Data Model so that technical users may better understand how the data is stored. And it will explore various use cases for leveraging the Read Replica as well as some examples.
Users that can access the RR can see everything including data items blinded in the User Interface, with the exception of Javascript Edit Checks and Methods (which are currently encrypted).
It is possible to implement different security for special database accounts, but that would require engineering work and cost and be non-standard.
Requesting a Read Replica
Any customer may request a Read Replica. This request should be made via a JIRA service desk ticket. In the ticket please specify the IP address which must be able to access this replica. Typically this is the IP address or range of the site where analysis will be done. The replicate will only be accessible from IP addresses specified in the ticket.
Please note that a fee may be associated with the provision and support for Read Replicas.
Read Replica Overview
All ClinSpark data is stored within a MySQL compatible relational database within the AWS Cloud. The Master database is the production database for the ClinSpark instance. This is where all data is written to and updated during the use of the ClinSpark application. In addition to other live backup mechanisms for operational use, a special read-only copy of this production data can be made available to customers. This database is a dedicated copy of the production database, solely for the purpose of customer usage. It is not used by running ClinSpark instances in any way. It is read-only, meaning that it does not accept writes, and it is not possible for any usage of this database to impact the Master database in any way.
ClinSpark Read Replicas are AWS RDS Aurora Cross-Region Read Replicas. You can find comprehensive documentation from AWS online on these replicas.
Here is a visual depiction of the Read Replica in the ClinSpark AWS production environment:
Secure Access to the Replica
Read Replicas are hosted in subnets which only allow traffic from explicitly whitelisted static public IPs. Customers must request that their organizations static public IP addresses be whitelisted in order to enable access to their replica. This can request can be made through a Service Desk ticket, and questions brought to our engineering team on this topic as needed.
If customers are unsure of their current public IP, they should reach out to their site network/IT administrator to clarify. Alternately, customers can review this help article to learn more about IP addresses and use a ‘Find your IP address’ feature to check their current public IP address.
Static & Dynamic IPs
A dynamic IP address is one that changes regularly. Most home-based internet service providers will offer their customers dynamic public IP addresses, as it is often the most cost-effective and flexible for the ISP to allocate. However this poses a security risk from the perspective replica access for home-based users, as whoever is later assigned that dynamic IP will have access to the database.
We therefore do not whitelist individual user home IPs due to their dynamic nature. We only support static IPs belonging to a site organization. Questions about this can be raised to our engineering team via service desk as needed.
Private IP Ranges
Please note that with access requests, private network IP ranges cannot be whitelisted. Private IP addresses may be assigned to computers that access internal company resources via VPN, or, physically connect to networks on site where a private IP is assigned.
Customers can verify if their assigned computer IP address falls within private address space by checking their active IP address prefix. Any IPs on the 10.8.XXX.XXX, 192.168.XXX.XXX or 172.16.XXX.XXX address ranges are private IPs and cannot be whitelisted.
Read Only User
A read-only user (e.g. ‘RO_USER') account for the Read Replica will be provided along with connection credentials.
The supplied password will be long (> 40 characters), complex (mixture of upper case letters, lower case letters, digits and symbols) and supplied to a named customer user via email and using an encrypted mechanism, and for a time-limited manner (typically 14 days). The credentials will need to be re-issued if not collected within this period. The password will not be shared via our service desk. We will retain a copy of the password in a secure vault restricted to IQVIA ClinSpark infrastructure engineers.
RO_USER has access to all databases on the read replica cluster.
It is the responsibility of the designated customer representative to keep the supplied credentials safe, to apply them appropriately and to only share them to authorised users.
Bear in mind that this account is a ‘service account’ (not limited to a named user) and does not audit or control specific user access. The customer is responsible for maintaining the security of the account and ensuring that appropriate individuals have access in a controlled manner. This may involve configuring an internal system that proxies the access and maintains appropriate authorisation and access logs. Example approaches would include using a Business Intelligence platform that grants end-user access; integrating the data into a local data lake; or using a simple MySQL proxy solution.
Connectivity Option: SSH Jump Host
To date nearly all customers have been pleased with the security offered by narrowly IP-whitelisted firewall protections for their Read Replica plus SSL. Some customers however would prefer to have the Read Replica further restricted and use a dedicated SSH jump host to access the replica. This is an option which is more complicated to setup on both the IQVIA and customer side. On the customer side typically an infrastructure SSH tunnel is established between the customer network and the ssh server dedicated to this customer’s replica access. Tunneling is then used to access the replica.
Please request this via a ticket if this is of interest to you. There will likely be extra charges. Note also that we provide no support for your use of this tunnel, which is relatively complicated vs the whitelisting approach.
Database Schema
The database schema for the Read Replica is the same as the schema for ClinSpark itself. This schema may evolve somewhat with each ClinSpark release.
More information about the ClinSpark Data Model can be found here: Read Replica Data Model
Read Replica Use Cases
There are a wide variety of usage patterns for customer Read Replicas. This is customer data, so it is expected to be used as business needs require. The following are a few common patterns presented as examples.
Integration to Existing Data Warehouse
Customers who have existing data warehouses or datamarts may choose to integrate ClinSpark data into these repositories. Typically this involves the customer setting up an ELT job to periodically pull data from the replica into their warehouse. This is not something that IQVIA supports directly. However it is an expected use case.
Ad Hoc SQL Queries
Nearly all SQL tools connect to MySQL databases. So whatever SQL tool you typically use should work with very little configuration. Use the tool that is most familiar to you. If you do not have a tool preference, one option may be the free MySQL Workbench, which is a fairly full featured tool. Using a SQL tool like this is useful for creating queries to answer questions on the fly. It can also be used to generate simple reports, perform data modeling (the above diagrams were created with this tool), etc.
The connection instructions above show how to connect MySQL Workbench to your replica. And there is extensive help online and in the app for using the tool from there.
Business Intelligence Tools
The ClinSpark Read Replica can be used with any Business Intelligence (BI) tool which operates on relational data. BI tools are very popular these days, and there are a wide variety of vendors.
We list some examples customer-hosted BI tools that the Read Replica can be connected to below.
Tableau
As shown above, it is easy to connect to Read Replica data using Tableau using a SSH tunnel to a local workstation or gateway. The standard Tableau Desktop connection wizard will guide you through the steps to connect from there. Please contact a Tableau representative for more details. They provide consulting services and training.
Here is a presentation on using Tableau with the Read Replica from the 2017 ClinSpark User Group conference.
Tibco Spotfire
Tibco Spotfire can access the Read Replica in the same manner as Tableau.
AWS Quicksight
Customers can also use AWS Quicksite. This is a far cheaper BI tool than Tableau or Spotfire, but it may be adequate depending on your use cases.
Crystal Reports or Similar
Crystal Reports or other similar products all can operate on MySQL databases. As such they can connect to and use the Read Replica. It is possible to create customer-specific reports using these tools.
Exported and Printed Copies Are Uncontrolled