...
All ClinSpark data is stored within a MySQL compatible relational database within the AWS Cloud. The Master database is the production database for the ClinSpark instance. This is where all data is written to and updated during the use of the ClinSpark application. In addition to other live backup mechanisms for operational use, a special read-only copy of this production data can be made available to customers. This database is a dedicated copy of the production database, solely for the purpose of customer usage. It is not used by running ClinSpark instances in any way. It is read-only, meaning that it does not accept writes, and it is not possible for any usage of this database to impact the Master database in any way.
ClinSpark Read Replicas are standard AWS RDS Aurora Read Replicas. You can find comprehensive documentation from AWS online on these replicas.
Here is a visual depiction of the Read Replica in the ClinSpark AWS production environment:
Note that like all ClinSpark application assets, it lives within private subnets of the Foundry Health Production Virtual Private Cloud (VPC). It is not exposed to the internet. It's data is encrypted at rest, and in transit via SSH.
...
During onboarding, 2 DNS names and one set of credentials will be provided to you. One is the DNS name of your dedicated bastion host to access the replica via SSH. And the other is the DNS name of the read replica itself with the AWS VPC. For this example, let's pretend these DNS name are as follows:
Customer Bastion Host (public DNS) | customer-replica-bastion.clinspark.com |
---|---|
Customer Read Replica (private DNS) | customer-replica.crs8xf7ezw7g.us-east-1.rds.amazonaws.com |
Read Replica Username | <username> |
Read Replica Password | <password> |
...
Info |
---|
The above approach is appropriate if only one or two trusted users will access the bastion. It is secure as long as the private keys are secured. Customers with more users will want to find more scalable ways to access the replica data. One could be a SSH persistent tunnel on the customer site, and used by other users at the site. AWS has a variety of options for this. See AWS VPNs and Direct Connect, which are two mechanisms to securely connect your sites to your own AWS account. From there a variety of options exist, including AWS PrivateLink to create secure connections between AWS accounts. It is the customer's responsibility to select and configure any options such as the above. Foundry Health will assist by suggesting approaches and making required configurations on our end. But the majority of the setup is in customer-owned infrastructure, and for this the customer is solely responsible. |
Clinical Data Interchange Standards Consortium Standard
...
<TODO: Round this part out>
...
Operational Data Model (CDISC ODM)
...
Links to CDISC documentation. Identify subject areas driven by CDISC. Introduce core entities to highlight how the standard represents clinical study design and study data. Introduction to these tables is given in the context of CDISC, and cleanly segues into the data model below. Highlights implicit point that the CDISC documentation itself is a valuable guide to the CS data model.
ClinSpark Data Model
to the extent possible and practical, the CS data model is tightly aligned to CDSIC 1.3.2. There are several benefits to this
- Standards-based simplifies interoperability
- Simplified training and transferable knowledge
- High quality data foundation vs proprietary makes data more future-proof
The following section highlights a series of key subject areas within the ClinSpark data model. Where elements are defined within the CDISC ODM standard, the descriptions from this documentation are used.
These entity diagrams and the relationships between them should help readers understand what can be found in tables and also how tables can be joined using SQL.
Study Design
The following subject areas are involved with study design.
Org, Volunteer, Study, Sites, StudySites an Subjects
...
Form Definition
...
A Form (FormDef in CDISC) describes a type of form that can occur in a study.
A form is basically a container for item groups. This class explicitly excludes the required 'repeating' (Yes|No) attribute from the domain due to the fact that phase 1 studies are different and that it's likely that most of the forms will repeat as they are related to a study event. At ODM build time, we'll check for those forms that repeat and ensure that we create the repeating attribute properly.
...
An ItemGroup (ItemGroupDef in CDISC) describes a type of item group that can occur within a study.
It basically is collection of related items in a given form.
...
An Item (ItemDef in CDISC) describes a type of item that can occur within a study. Item properties include name, datatype, measurement units, range or codelist restrictions, and several other properties.
It basically represents the definition of a piece of data collected.
...
Activity Plans
An activity plan is a schedule of events for a given cohort. Activity Plans do not appear in CDSIC, there is no similar construct in the ODM to capture this concept. Activity Plan, Segment and Scheduled Activity are ClinSpark specific.
...
- Reference time must be null
- Single segment; segment must be root and must have offset second set to zero
...
- Must have a reference time
- Can have 1-n segments; always sort by offset seconds
- Reference segment must have offset seconds of zero
...
Holds a group of scheduled activities in an activity plan. The segment's offset seconds is essentially the time of the reference event, all scheduled activities are relative to this.
Modeled somewhat of off CDISC SDM:
"Segments are often seen as the basic building blocks of study design. A segment usually specifies a combination of planned observations and interventions, which may or may not involve treatment, during a period of time."
...
Epochs and Cohorts
...
Binds a Volunteer as Subjects within a given cohort along with an Activity Plan. We allow assigning different schedules, and we allow for later scheduling a subject.
...
Clinical Data
Parallel data and definition objects in CDISC
CDISC ODM defines a pattern for modeling the definition of a clinical data element and the data captured. This pattern has one element describe the definition, and another element record the data for a particular clinical collection of this element type. ClinSpark adheres to this pattern, though as noted in the table below, the suffix "Def" is removed from the definition elements. Here are the examples
...
With this in mind, below is a view of how these elements relate to each other.
...
Study Data
...
Lab Data
Lab data can be reached via ItemData
...
Volunteers
Volunteers are not a part of CDISC.
...
Users
...
Read Replica Usage
Integration to your Existing Data Warehouse
Ad Hoc SQL Queries
Probably all SQL tools connect to MySQL databases. So whatever SQL tool you typically use should work with very little configuration. Use the tool that is most familiar to you. If you do not have a tool preference, one option may be the free MySQL Workbench, which is a fairly full featured tool. Using a SQL tool like this is useful for creating queries to answer questions on the fly. It can also be used to generate simple reports, perform data modeling (the above diagrams were created with this tool), etc.
The connection instructions above show how to connect MySQL Workbench to your replica. And there is extensive help online and in the app for using the tool from there.
Business Intelligence Tools
The ClinSpark Read Replica can be used with any Business Intelligence tool which operates on relational data. These are
Tableau
Tibco Spotfire
AWS Quicksight
...
Here is how the Clinical Data Interchange Standards Consortium describes the ODM in the introduction of the specification:
The Operational Data Model (ODM) is a vendor neutral, platform independent format for interchange and archive of clinical study data. The model includes the clinical data along with its associated metadata, administrative data, reference data and audit information. All of the information that needs to be shared among different software systems during the setup, operation, analysis, submission or for long term retention as part of an archive is included in the model.
Clinical data management systems vary significantly in the information they store and the rules they enforce. The ODM model has been designed to represent a wide range of study information so as to be compatible with most existing clinical data management systems. Systems that do not have all of the features represented by the ODM model may still be ODM compatible as long as they comply with the conformity rules provided in the section on System Conformity.
The ODM has been designed to be compliant with guidance and regulations published by the FDA for computer systems used in clinical studies. This document is intended to be both the formal specification of the ODM and a user guide for those involved in transferring or archiving of clinical data using the model.
To the maximum extent possible, the ClinSpark data model is based on the CDISC Operational Data Model (ODM) standard. In fact, a significant portion of the database schema for ClinSpark was generated directly from the ODM XML schema. ODM scope is limited to the core entities of clinical study data. The ClinSpark data model includes this but goes far beyond this scope. As such the CDISC ODM and it's related documentation are excellent sources of information to understand the subset of the ClinSpark data model which overlaps. But for concerns outside of CDISC ODM, such as volunteers, lab data and many others, the ODM documentation will not be helpful.
The CDISC ODM documentation can be freely downloaded from here. We recommend that anyone intending to work with and understand the ClinSpark data model spend some time getting familiar with the ODM, as it provides valuable insights into both design and intended usage.
ClinSpark Data Model
To the extent possible and practical, the ClinSpark data model is based on the CDSIC ODM 1.3.2 data model. A few of the benefits of this are:
- Standards-based simplifies interoperability. ClinSpark natively produces and accepts CDISC data, simplifying integration with other products and vendors supporting this standard.
- Simplified training and transferable knowledge. CDISC ODM is fairly widely used within the industry. This makes our data model relatively easy for new users familiar with ODM to comprehend.
- High quality data foundation vs proprietary makes data more future-proof.
The following section highlights a series of key subject areas within the ClinSpark data model. This section covers both CDISC and non-CDISC entities, since this is the nature of the ClinSpark data model.
The entity diagrams and the relationships between them is intended to help readers understand what can be found in tables and also how tables can be joined using SQL. This is not meant to be comprehensive, it should be used in conjunction with the other schema documentation provided.
Study Design
The following subject areas are involved with study design.
Org, Volunteer, Study, Sites, StudySites an Subjects
Table | From CDISC? | Notes | |
---|---|---|---|
1 | org | No | An org represents an entity performing clinical research (CRO / CRU). Orgs can have multiple sites that execute studies. |
2 | study | Yes | This element collects static structural information about an individual study. A study is related to a given clinical trial protocol. |
3 | site | No | A site is a physical place belonging to an organization. An organization having multiple physical clinical sites will have multiple site rows. |
4 | study_site | No | An association between a physical site and a study. A study site is different than a physical location. Often, pharma sponsors will specify sites with arbitrary codes and those codes must pass through during data export time. In addition, this domain encapsulates recruitment efforts for a given study / site. |
5 | volunteer | No | A volunteer is someone who indicates that they are interested in participating in clinical research for the given org. |
6 | subject | No | Someone participating in clinical research within the context of a given study. Creates glue between the volunteer and the participation. |
Form Definition
Table | From CDISC? | Notes | |
---|---|---|---|
1 | StudyMetaData | Yes | StudyMetaData (MetadataVersion in CDISC) defines the types of study events, forms, item groups, and items that form the study data. This is basically an aggregation of all CRF design elements for a study. |
2 | Form | Yes | A Form (FormDef in CDISC) describes a type of form that can occur in a study. A form is basically a container for item groups. This class explicitly excludes the required 'repeating' (Yes|No) attribute from the domain due to the fact that phase 1 studies are different and that it's likely that most of the forms will repeat as they are related to a study event. At ODM build time, we'll check for those forms that repeat and ensure that we create the repeating attribute properly. |
3 | ItemGroup | Yes | An ItemGroup (ItemGroupDef in CDISC) describes a type of item group that can occur within a study. It basically is collection of related items in a given form. |
4 | Item | Yes | An Item (ItemDef in CDISC) describes a type of item that can occur within a study. Item properties include name, datatype, measurement units, range or codelist restrictions, and several other properties. It basically represents the definition of a piece of data collected. |
5 | ItemGroupRef | Yes | A reference to a given ItemGroup. This reference can hold data about the association. |
6 | ItemRef | Yes | A reference to a given Item. This reference can hold data about the association. |
Activity Plans
An activity plan is a schedule of events for a given cohort. Activity Plans do not appear in CDSIC, there is no similar construct in the ODM to capture this concept. Activity Plan, Segment and Scheduled Activity are ClinSpark specific.
Table | From CDISC? | Notes | |
---|---|---|---|
1 | Study | Yes | This element collects static structural information about an individual study. A study is related to a given clinical trial protocol. |
2 | Activity Plan | No | A schedule of events for a given cohort. Plans can be assigned to multiple cohorts. A timed plan must have a reference time in order to properly provide UI feedback as segments and scheduled activities are set. Untimed Activity Plan:
|
3 | Segment | Partially | Holds a group of scheduled activities in an activity plan. The segment's offset seconds is essentially the time of the reference event, all scheduled activities are relative to this. Modeled somewhat of off CDISC SDM: "Segments are often seen as the basic building blocks of study design. A segment usually specifies a combination of planned observations and interventions, which may or may not involve treatment, during a period of time." |
4 | Scheduled Activity | No | Wraps a form, but adds metadata including timing. |
5 | Form | Yes | A form is basically a container for item groups. |
6 | Study Event | Yes | A study event represents a given 'visit'. In phase 1 trials this will commonly simply refer to a 'day'. When scheduling forms for a given schedule, the builder must associate the study event. Note: there are common study events that are typically reserved for special events: unscheduled, common (AE, CM), etc |
Epochs and Cohorts
Table | From CDISC? | Notes | |
---|---|---|---|
1 | epoch | No | An epoch is typically specified in a study protocol and typically signifies some milestone type events within the trial. ie: screening, treatment, followup, etc |
2 | cohort | No | A cohort is a way to break up epochs into different groupings. Protocols will occasionally indicate that epochs should be broken up (perhaps in different trial arms to test different dose levels, etc), or this can just purely be an organizational thing |
3 | cohort_assignment | No | Binds a Volunteer as Subjects within a given cohort along with an Activity Plan. We allow assigning different schedules, and we allow for later scheduling a subject. |
4 | activity_plan | No | A schedule of events for a given cohort. |
6 | subject | No | Someone participating in clinical research within the context of a given study. |
Clinical Data
Parallel data and definition objects in CDISC
CDISC ODM defines a pattern for modeling the definition of a clinical data element and the data captured. This pattern has one element describe the definition, and another element record the data for a particular clinical collection of this element type. ClinSpark adheres to this pattern, though as noted in the table below, the suffix "Def" is removed from the definition elements. Here are the examples
Data Definition Element | Data Capture Element | Notes |
---|---|---|
Item (ItemDef in CDISC) | ItemData | Represent the definition of a piece of data collected |
ItemGroup (ItemGroupDef in CDISC) | ItemGroupData | Aggregates item data |
Form (FormDef in CDISC) | FormData | Basically a container for item groups |
StudyEvent (StudyEventDef in CDISC) | StudyEventData | Clinical data for a study event (visit) for a given subject |
With this in mind, below is a view of how these elements relate to each other.
Table | From CDISC? | Notes | |
---|---|---|---|
1 | form | Yes | Basically a container for item groups. |
2 | form_data | Yes | Form data represents data collected for a given subject. Instead of storing the scheduled time on this domain, we leverage the relationship to the encapsulated scheduledActivity domain and thus its relationship to the segment. We purposely don't set formRepeatKey in the domain. It is calculated later on when building ODM. |
3 | item_group | Yes | A collection of related items in a given form |
4 | item_group_ref | Yes | A given item group within a form definition. |
5 | item_group_data | Yes | Aggregates item data |
6 | item | Yes | Represents the definition of a piece of data collected. |
7 | item_ref | Yes | A given item within an item group definition. |
8 | item_data | Yes | A piece of collected data |
9 | study_event | Yes | A study event represents a given 'visit'. In phase 1 trials this will commonly simply refer to a 'day'. When scheduling forms for a given schedule, the builder must associate the study event. Note: there are common study events that are typically reserved for special events: unscheduled, common (AE, CM), etc |
10 | study_event_data | Yes | Clinical data for a study event (visit) for a given subject |
Study Data
Table | From CDISC? | Notes | |
---|---|---|---|
1 | item_data | Yes | A piece of collected data |
2 | item_group_data | Yes | Aggregates item data |
3 | form_data | Yes | Form data represents data collected for a given subject. Instead of storing the scheduled time on this domain, we leverage the relationship to the encapsulated scheduledActivity domain and thus its relationship to the segment. |
4 | study_event_data | Yes | Clinical data for a study event (visit) for a given subject |
5 | study_event | Yes | A study event represents a given 'visit'. In phase 1 trials this will commonly simply refer to a 'day'. When scheduling forms for a given schedule, the builder must associate the study event. Note: there are common study events that are typically reserved for special events: unscheduled, common (AE, CM), etc |
6 | subject | No | Someone participating in clinical research within the context of a given study. |
Lab Data
Lab data can be reached via ItemData.
Table | From CDISC? | Notes | |
---|---|---|---|
1 | item_data | Yes | A piece of collected data. Note that lab orders and all results are associated to a given Item Data. You will need to join through Item Data when working with lab data. |
2 | base_specimen | Yes | Modeled off of CDISC Lab. A specimen is collected from a subject and assigned to a given item data instance. There can be multiple batteries (test groups) associated to a given specimen. Combines Accession Level and Base Specimen from spec. |
3 | base_battery | No | A panel related to a specimen - typically this is just a 1:1. |
4 | base_test_result | Partially | Combines CDISC Lab BaseTest and BaseResult. These are the results from the lab. |
5 | lab_order | No | When specimens are collected, this domain represents that an order is generated. It causes a manifest file to be created (PDF) and potentially a file order to be dumped on to the file system and made available to web services. |
6 | lab_interface | No | Encapsulates how to send and receive orders and results from a particular safety lab. Sites may have multiple labs, and if so each of these will have their own lab interface instance. |
7 | study_lab_panel | No | Something that can be ordered from item level |
8 | specimen_container | No | When setting up samples or labs, users can optionally choose a container. |
9 | lab_repeat | No | A domain that allows for the management of lab repeat workflows |
Volunteers
Volunteers are not a part of CDISC.
Table | From CDISC? | Notes | |
---|---|---|---|
1 | volunteer | No | Someone who indicates that they are interested in participating in clinical research for the given org.Someone who indicates that they are interested in participating in clinical research for the given org. |
2 | volunteer_medical_condition | No | Associates a given condition to a given volunteer |
3 | volunteer_note | No | A simple note that can be attached to the volunteer record |
4 | volunteer_correspondence | No | Represents a call or text to / from a volunteer by way of Twilio |
5 | volunteer_substance_use | No | We purposely don't track SUOCCUR, it allows us to indicate that the vlunteer is not using the substance. |
6 | recruitment_appointment | No | Allows for a given volunteer to be assigned to a given time slot |
7 | study_site | No | An association between a study and a site |
8 | subject | No | Someone participating in clinical research within the context of a given study. |
Users
Table | From CDISC? | Notes | |
---|---|---|---|
1 | application_user | No | A user in the system |
2 | study | Yes | In this context, these are study restrictions. Studies which appear here mean the user is whitelisted to working only with data from these studies. |
3 | application_user_role | No | Roles which the user has |
4 | application_user_role_secure_actions | Yes | Secure Actions are permissions which the role entitles the user to. |
Read Replica Usage
There are a wide variety of usage patterns for customer Read Replicas. This is your data, so use it as your business needs require. The following are a few common patterns presented as examples.
Integration to your Existing Data Warehouse
Customers who have existing data warehouses or datamarts may choose to integrate ClinSpark data into these repositories. This can be done using the SSH channel provided.
Ad Hoc SQL Queries
Nearly all SQL tools connect to MySQL databases. So whatever SQL tool you typically use should work with very little configuration. Use the tool that is most familiar to you. If you do not have a tool preference, one option may be the free MySQL Workbench, which is a fairly full featured tool. Using a SQL tool like this is useful for creating queries to answer questions on the fly. It can also be used to generate simple reports, perform data modeling (the above diagrams were created with this tool), etc.
The connection instructions above show how to connect MySQL Workbench to your replica. And there is extensive help online and in the app for using the tool from there.
Business Intelligence Tools
The ClinSpark Read Replica can be used with any Business Intelligence (BI) tool which operates on relational data. BI tools are very popular these days, and there are a wide variety of vendors.
Here is an example of one way that the Read Replica can be connected to a customer-hosted BI tool called Tableau:
Tableau
As shown above, it is easy to connect to Read Replica data using Tableau using a SSH tunnel to a local workstation or gateway. The standard Tableau Desktop connection wizard will guide you through the steps to connect from there. Please contact a Tableau representative for more details. They provide consulting services and training.
Tibco Spotfire
Tibco Spotfire can access the Read Replica in the same manner as Tableau.
AWS Quicksight
TODO: research how to connect a customer Quicksight account to a Replica exposed via SSH. Our demos have been with FH Quicksight account connecting to FH-owned RDS replicas. This approach will not work using customer-owned AWS Quicksight accounts. We need another way...
Customers can also use AWS Quicksite. This is a far cheaper BI tool than Tableau or Spotfire, but it may be adequate depending on your use cases.
Crystal Reports or Similar
Crystal Reports or other similar products all can operate on MySQL databases. As such they can connect to and use the Read Replica. It is possible to create customer-specific reports using these tools.
SQL Examples
Pick a few reports and show how SQL can produce similar results
...