Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents

Overview

ClinSpark customers have available to them a live read-only copy of their production ClinSpark database called a Read Replica.  This document will describe what a Read Replica is from a technical perspective.  It will provide descriptions of key aspects of the underlying Data Model so that technical users may better understand how the data is stored.  And it It will explore various use cases for leveraging the Read Replica as well as some examples.  

...

Note that like all ClinSpark application assets, it lives within private subnets of the Foundry Health a Production Virtual Private Cloud (VPC).  It is not exposed to the internet.  Its data is encrypted at rest, and in transit via SSH.

Latency

Info

How old is the data in the Replica?

As updates are applied to Master, the Replica receives these updates in real time.  There is a delay, but due to the architecture of AWS RDS Aurora, this delay is typically no greater than 20 milliseconds.  So 20 ms , 20ms after a change to the production database is made, the customer's Read Replica has this update available.

...

There are a variety of ways to access the Read Replica.  For customers with established relationships with AWS and special connections between AWS The process that explains steps needed to gain access to the replica via service desk ticket, visit this page: Connecting to your Read Replica . This article will cover an in-depth view on connecting assuming that access has been granted.

For customers with established relationships with AWS and special connections between AWS and their site's data centers, a variety of approaches exist for providing more direct connections to the replica.  

However, for most customers , the most common way to access the Read Replica is via SSH.  For these customers, we provision a special bastion host which has no purpose other than to provide this customer with access to this replica.  This host is accessible only via SSH, and only using the a customer-owned private SSH key.  This host has no access to the environment other than to access this replica.

...

Customer Bastion Host (public DNS)

customer-replica-bastion.clinspark.com

Customer Read Replica (private DNS)

customer-replica.crs8xf7ezw7g.us-east-1.rds.amazonaws.com

Read Replica Username

<username>

Read Replica Password

<password>

...

  1. Create a SSH key you will use to access the DB through the bastion.  If you already have one that's fine.  Here are instructions for doing this and also adding the key to the ssh-agent.

  2. Provide Foundry Health IQVIA with the public SSH keys of users who need access to this database.  We will place this on the bastion, allowing these users to tunnel into the replica.  Open a service ticket with the public key attached and we'll get this installed quickly.

  3. Verify you have access by connecting to the bastion and replica as follows from the command line:

...

The above shows a terminal session where the user has first added the private ssh key to the ssh-agent as described in the previous link.  The user connects to the bastion, and from there is able to use the mysql MySQL client to interact with the replica directly from the bastion.

...

Info

The above approach is appropriate if only one or two trusted users will access the bastion.  It is secure as long as the private keys are secured.  Customers with more users will want to find more scalable ways to access the replica data.  One could be a SSH persistent tunnel on the customer site, and used by other users at the site.  AWS has a variety of options for this.  See AWS VPNs and Direct Connect, which are two mechanisms to securely connect your sites to your own AWS account.  From there a variety of options exist, including AWS PrivateLink to create secure connections between AWS accounts.

It is the customer's responsibility to select and configure any options such as the above.  Foundry Health IQVIA will assist by suggesting approaches and making required configurations on our end.  But the majority of the setup is in customer-owned infrastructure, and for this the customer is solely responsible.

...

CDISC is an open, multidisciplinary, non-profit organization committed to the development of industry standards to support the electronic acquisition, exchange, submission, and archiving of clinical trials data and metadata for medical and biopharmaceutical product development. Details about CDISC and ClinSpark are covered here: CDISC

Operational Data Model (CDISC ODM)

Here is how the Clinical Data Interchange Standards Consortium describes the ODM in the introduction of the specification:

The Operational Data Model (ODM) is a vendor neutral, platform independent format for interchange and archive of clinical study data. The model includes the clinical data along with its associated metadata, administrative data, reference data and audit information. All of the information that needs to be shared among different software systems during the setup, operation, analysis, submission or for long term retention as part of an archive is included in the model.

Clinical data management systems vary significantly in the information they store and the rules they enforce. The ODM model has been designed to represent a wide range of study information so as to be compatible with most existing clinical data management systems. Systems that do not have all of the features represented by the ODM model may still be ODM compatible as long as they comply with the conformity rules provided in the section on System Conformity.

The ODM has been designed to be compliant with guidance and regulations published by the FDA for computer systems used in clinical studies. This document is intended to be both the formal specification of the ODM and a user guide for those involved in transferring or archiving of clinical data using the model.

To the maximum extent possible, the ClinSpark data model is based on the CDISC Operational Data Model (ODM) standard. In fact, a significant portion of the database schema for ClinSpark was generated directly from the ODM XML schema.  ODM scope is limited to the core entities of clinical study data.  The ClinSpark data model includes this but goes far beyond this scope.  As such the CDISC ODM and it's related documentation are excellent sources of information to understand the subset of the ClinSpark data model which overlaps.  

...

Additionally the way that ClinSpark handles the concept of Assession Accession comes directly from the CDISC LAB standard, section 3.4.8.  Here is the guidance from this section:

...

The entity diagrams and the relationships between them is intended to help readers understand what can be found in tables and also how tables can be joined using SQL.  This is not meant to be comprehensive, it should be used in conjunction with the other schema documentation provided.

Study Design

...

Info

Note that this information may change over time, but will provide a solid foundation for training purposes. The data model in ClinSpark may change each functional release, to support new capabilities or changes to existing features.

Study Design

The following subject areas are involved with study design.  

Note that ClinSpark has made significant extensions to the ODM in a number of areas.  Two examples are device connectivity and importation from the volunteer record.  There is no provision within ODM for either of these features.  To support this, extensions have been made to data within the ItemGroup and ItemRef domains.  This allows noting fields which will have their values populated from direct capture from medical devices, or from a volunteer record within the database.  Other such extensions exist throughout the ClinSpark data model.  As such, domains which originate in the CDISC standard often contain a superset of the data described by CDISC and also additions created by Foundry HealthIQVIA.

Org, Volunteer, Study, Sites, StudySites and Subjects

...

An activity plan is a schedule of events for a given cohort.  Activity Plans do not appear in CDSIC.  In In ODM, there is the notion of a FormRef. However, this design doesn't fit well with ph1 early phase trials where forms are commonly repeated for a given study event (ie many PKs in a given day). As such, FormRef is implicitly available by way of Scheduled Activities that are a part of a Segment / Activity Plan.

...

Key

Table

From CDISC?

Notes

1

Study

Yes

This element collects static structural information about an individual study.  A study is related to a given clinical trial protocol.

2

Activity Plan

No

A schedule of events for a given cohort. Plans can be assigned to multiple cohorts. A timed plan must have a reference time in order to properly provide UI feedback as segments and scheduled activities are set.

Untimed Activity Plan:

  • Reference time must be null

  • Single segment; segment must be root and must have offset second set to zero

Timed Activity Plan:

  • Must have a reference time

  • Can have 1-n segments; always sort by offset seconds

  • Reference segment must have offset seconds of zero

Activity Plan fills the role of the FormRef in the ODM.

3

Segment

Partially

Holds a group of scheduled activities in an activity plan. The segment's offset seconds is essentially the time of the reference event.  All scheduled activities are relative to this.

Modeled somewhat off of CDISC SDM:

"Segments are often seen as the basic building blocks of study design. A segment usually specifies a combination of planned observations and interventions, which may or may not involve treatment, during a period of time."

4

Scheduled Activity

No

Wraps a form, but adds metadata including timing.

5

Form

Yes

A form is basically a container for item groups.

6

Study Event

Yes

A study event represents a given 'visit'. In phase 1 trials this will commonly simply refer to a 'day'. When scheduling forms for a given schedule, the builder must associate the study event. Note: there are common study events that are typically reserved for special events: unscheduled, common (AE, CM), etc

...

Key

Table

From CDISC?

Notes

1

epoch

No

An epoch is typically specified in a study protocol and typically signifies some milestone type events within the trial. ie: screening, treatment, followupfollow-up, etc

2

cohort

No

A cohort is a way to break up epochs into different groupings. Protocols will occasionally indicate that epochs should be broken up (perhaps in different trial arms to test different dose levels, etc), or this can just purely be an organizational thing.

3

cohort_assignment

No

Binds a Volunteer as a Subject within a given cohort along with an Activity Plan. We allow assigning different schedules, and we allow for later scheduling a subject.

4

activity_plan

No

A schedule of events for a given cohort.

6

subject

Yes

Someone participating in clinical research within the context of a given study.

...

CDISC ODM defines a pattern for modeling the definition of a clinical data element and the data captured.  This pattern has one element describe the definition, and another element record the data for a particular clinical collection of this element type.  ClinSpark adheres to this pattern, though as noted in the table below, the suffix "Def" is removed from the definition elements.  Here are the examples:

Data Definition Element

Data Capture Element

Notes

Item (ItemDef in CDISC)

ItemData

Represent the definition of a piece of data collected

ItemGroup (ItemGroupDef in CDISC)

ItemGroupData

Aggregates item data

Form (FormDef in CDISC)

FormData

Basically a container for item groups

StudyEvent (StudyEventDef in CDISC)

StudyEventData

Clinical data for a study event (visit) for a given subject

With this in mind, below is a view of how these elements relate to each other.

...

Note that form_data can have linkage to forms through scheduled activities or unscheduled.

ItemGroupData has an item_group_repeat_key which is used to track repeats.

Key

Table

From CDISC?

Notes

1

form

Yes

Basically a container for item groups.

2

form_data

Yes

Form data represents data collected for a given subject. Instead of storing the scheduled time on this domain, we leverage the relationship to the encapsulated scheduledActivity domain and thus its relationship to the segment. We purposely don't set formRepeatKey in the domain. It is calculated later on when building ODM.

3

item_group

Yes

A collection of related items in a given form

4

item_group_ref

Yes

A given item group within a form definition.

5

item_group_data

Yes

Aggregates item data

6

item

Yes

Represents the definition of a piece of data collected.

7

item_ref

Yes

A given item within an item group definition.

8

item_data

Yes

A piece of collected data

9

study_event

Yes

A study event represents a given 'visit'. In phase 1 trials this will commonly simply refer to a 'day'. When scheduling forms for a given schedule, the builder must associate the study event. Note: there are common study events that are typically reserved for special events: unscheduled, common (AE, CM), etc

10

study_event_data

Yes

Clinical data for a study event (visit) for a given subject

...

Key

Table

From CDISC?

Notes

1

volunteer

No

Someone who indicates that they are interested in participating in clinical research for the given org.  Someone who indicates that they are interested in participating in clinical research for the given org.

2

volunteer_medical_condition

No

Associates a given condition to a given volunteer

3

volunteer_note

No

A simple note that can be attached to the volunteer record

4

volunteer_correspondence

No

Represents a call or text to / from a volunteer by way of Twilio

5

volunteer_substance_use

No

We purposely don't track SUOCCUR, it allows us to indicate that the vlunteer is not using the substance.

6

recruitment_appointment

No

Allows for a given volunteer to be assigned to a given time slot

7

study_site

No

An association between a study and a site

8

subject

Yes

Someone participating in clinical research within the context of a given study.

Users

...

Key

Table

From CDISC?

Notes

1

application_user

No

A user in the system

2

study

Yes

In this context, these are study restrictions.  Studies which appear here mean the user is whitelisted to working only with data from these studies.

3

application_user_role

No

Roles which the user has

4

application_user_role_secure_actions

Yes

Secure Actions are permissions which the role entitles the user to.

...

Customers who have existing data warehouses or datamarts data-marts may choose to integrate ClinSpark data into these repositories.  This can be done using the SSH channel provided.  

...

Tibco Spotfire can access the Read Replica in the same manner as Tableau. 

AWS Quicksight

TODO:  research how to connect a customer Quicksight account to a Replica AWS Quicksight is a cheaper BI tool than Tableau or Spotfire, and may be adequate depending on the use case. It may be possible to connect a customer AWS Quicksight account to a Replica exposed via SSH.  Our demos have been with FH We have limited experience connecting a Quicksight account connecting to FH-owned IQVIA RDS replicas.  This approach will not work using customer-owned AWS Quicksight accounts.  We need another way...Customers can also use AWS Quicksite.  This is a far cheaper BI tool than Tableau or Spotfire, but it may be adequate depending on your use cases.  topic the ClinSpark engineering team is still investigating. If customers are interested in using AWS Quicksight, please reach out via service desk ticket.

Crystal Reports or Similar

...