1 Preface
2 Introduction
3 Related
4 Experiment
- 4.1 Introduction
- 4.2 Executive Summary
  - 4.2.1 Average Response Times Stay Small
  - 4.2.2 Response Time Distributions
  - 4.2.3 Vertically Scaling the Database
- 4.3 Approach
  - 4.3.1 Study Overview
  - 4.3.2 Data Basics
  - 4.3.3 Testing Automation and Execution
  - 4.3.4 Pre-Test Environment Warm Up
- 4.4 Results
  - 4.4.1 Five (5) Studies
    - 4.4.1.1 Key Data
    - 4.4.1.2 Reports
  - 4.4.2 50 Studies
    - 4.4.2.1 Key Data
    - 4.4.2.2 Reports
  - 4.4.3 100 Studies
    - 4.4.3.1 Key Data
    - 4.4.3.2 Reports
  - 4.4.4 250 Studies
    - 4.4.4.1 Key Data
    - 4.4.4.2 Reports
  - 4.4.5 500 Studies
    - 4.4.5.1 Key Data
    - 4.4.5.2 Reports
  - 4.4.6 1000 Studies
    - 4.4.6.1 Key Data
    - 4.4.6.2 Reports
  - 4.4.7 1500 Studies
    - 4.4.7.1 Key Data
    - 4.4.7.2 Reports
  - 4.4.8 2000 Studies
    - 4.4.8.1 Key Data
    - 4.4.8.2 Reports
- 4.5 Test Components
  - 4.5.1 Web Server
  - 4.5.2 Database Server
  - 4.5.3 Test Harness Client
  - 4.5.4 JMeter Script
- 4.6 Analysis table definitions
- 4.7 Miscellaneous
5 Postscript

Preface

ClinSpark is designed to be performant with the goal of…

always fast, regardless of user load

We expect to see average response times in properly sized production environments around 250 ms.

Introduction

The testing exercise reported here was conducted in 2016.

We don’t typically engage in performance testing exercises with customers. If this is something you need, please talk to your project manager and they can arrange to have such an exercise scoped and costed.

Experiment

Introduction

High performance has always been a key design goal for ClinSpark. This foundational concern has driven every aspect of the system's design. It permeates through the application code, the database design and the deployment infrastructure. Customers tend to be drawn to ClinSpark because usage of the system has the potential to transform business and clinical operations. But, precisely because it impacts so many aspects of the clinical ecosystem, performance issues over time could potentially be disastrous for customers.

We are including here a testing approach, with related results and analysis from an exercise that we performed to address specifically the question of whether ClinSpark performance will degrade over time. System demonstrations have repeatedly shown that ClinSpark is fast with default datasets. This effort presents the answer to the question about what will happen to performance when large multi-site customers use ClinSpark continuously for years, with the accumulated data from past studies available in a single database.

A key part of our testing effort entailed creating a series of progressively large datasets. As described in detail below, each dataset contains a volume of data representing a particular number of studies, ranging from 5 to 2000. A test script was created that simulates a user interacting with a large coverage area of ClinSpark in an automated, repeatable fashion. Then a series of tests were conducted where this user activity was applied over separate instances of ClinSpark initialized with each of the sample databases. Each test recorded minute details about system response times.

A key goal for this test was to identify application functionality that would not perform acceptably as the size of the database grows over time. The primary aim was not an attempt to apply a massive volume of requests concurrently, but rather the focus was to apply a relatively constant number of requests against a massively increasing volume of data. Also note that while this exercise isn't truly a load test, the request volume applied was still roughly 20 requests per second. This can be observed in detail in the zip attachments; look for the charts on throughput.

Raw data outputs from JMeter are available for the experiment described here and have been included as downloadable ZIP files within each test run results section. Download the ZIP, unpack it and click on the index.html file

Executive Summary

This exercise objectively demonstrates that ClinSpark maintains a very low response time even with large volumes of accumulated study data in the database.

Average Response Times Stay Small

The aggregate average response time for ClinSpark requests increases linearly with increasing historical study volume. However, even with 2000 studies in the ClinSpark database, the average response time did not exceed 289 milliseconds.

ClinSpark pages may issue multiple requests concurrently using AJAX, in addition to the loading of the primary page. So these results do not correlate one-to-one with page view times. However, they do mean that the typical user with a typical site network connection should have the majority of pages ready for use with sub-second response times.

Response Time Distributions

The averages, of course, only show a blend of all transaction types. Which transactions are fast and which are slower matters.

The below chart is taken from the Response Time Distribution data collected for each execution of the test. This chart groups response counts into time ranges and comes from the run with 1000 studies. It shows that the majority of the transactions had responses rendered in less than a second, but a few outliers took as many as 10. Note the color of the bars, which indicates transaction type.

This next chart on response distribution shows a per-transaction view of the same data. This makes it easy to spot the outliers, the slower transactions in the far right of the curve.

The small numbers of transactions that exceed a 2-second threshold are not common high-frequency user operations. The most common user operations fall within the sub-second grouping.

Vertically Scaling the Database

As detailed below, the database used for this series of tests is a memory-optimized "db.r3.2xlarge" instance. This database has 8 virtual CPUs and 61 GB of RAM. This database proved more than adequate even with the test where over 1 billion rows of test data were present. Our preferred deployment provider, Amazon Web Services, currently has database servers four times as powerful, with 32 virtual CPUs and 244 GB of RAM, which at any point in the future can be swapped in should an environment require additional performance. This is a key point, because the database is the one part of the architecture that can only be vertically scaled.

Approach

A base study was designed to maximize a variety of data types that are commonly found in typical clinical trials. While although the protocol is not overly realistic, it does allow for establishing a solid foundation for which testing can occur. Once the protocol was built into the system, a manual execution of the study was performed. With the base study results, a program was then developed that allowed for copying Y subjects into X studies. During the copy program's execution, database snapshots were taken at different study milestones to allow for future testing: 5, 50, 100, 250, etc. Finally, an automated test script was developed that allowed for capturing objective server metrics for analysis.

Study Overview

50 subjects per study
10 study events per protocol: Days 1-10
Each study day is identical:
- 23 PK events
  - Basic sample path of centrifuge, transfer to two aliquots, freeze
  - Each transfer tube is associated with a sample container and later associated with a configured shipment
- 4 ECGs: standard Mortara intervals and interpretation fields exposed in the form
- 4 vitals signs: Diastolic, Systolic, MAP, Rate, and Temperature
- 1 dosing event
- 1 PE with three repeating groups
  - Standard PE form from CDISC CDASH
- Hematology, Clinical Chemistry, Urinalysis
  - Each test to have simulated test results included
  - Each result to be reviewed in the system

Data Basics

With each study milestone, the copy program increases the number of volunteers, with a final goal of 500,000
- Each managed volunteer will randomly have 7 medical conditions associated
- Date of birth is randomly created, and will be in the range of 12-SEP-1926 to 12-SEP-1996
- Height and weight are randomly created with a range of 1.3 - 2.2m and 34 - 181kg respective
  - This level of randomization created a wide spectrum BMIs
- Address, city, first name, middle name, last name, postal code, email, and phone numbers were randomly obtained using an online data generator
  - http://www.convertcsv.com/generate-test-data.htm
11 contraception types
155 medical conditions
50 volunteer regions (US states)
5 tobacco types
Mock ECG and mock Vitals 'devices'
Each ECG and vitals test leverages this mock interface, and thus the results are identical across each time-point
Each study has 75 appointment windows
900 users
Usernames are 001 to 900
Password hardcoded as [REDACTED]
Each user assigned an 'admin' role

Testing Automation and Execution

A JMeter (http://jmeter.apache.org) test script was developed in order to simulate users interacting with ClinSpark against varying data loads. The script allowed for specifying URL endpoints, and any corresponding data that was required to complete a given action (ie entering volunteer details in order to perform a search). The goal was to interact with each ClinSpark application component, all while automatically collecting critical performance metrics. Leveraging JMeter empowered execution to be consistent in terms of the tests conducted, and it also allowed for establishing desired levels of concurrency. For the tests performed, the script was configured to run with five concurrent threads, a ramp up time of 50 seconds, and a loop count of 15. Because the script does not include pause times in between sampler invocations, the test simulated a high level of throughput in relation to system usage. Upon completion of each test, the output from the framework was saved and analyzed. Tests were explicitly executed on a computer system and network outside of the application's deployment infrastructure.

The script was run against differing database sizes as described in the Results section that follows. For each test run, very detailed reports are generated and are available for download. High-level summaries of each test run are found with their corresponding tests in the sections that follow. APDEX or Application Performance Index (http://www.apdex.org) is a measure of user experience based on user wait times. The configured APDEX satisfaction and tolerance thresholds have been defined as 3 and 5 seconds respective.

Pre-Test Environment Warm Up

A freshly restarted database and application environment will always be slower than one that has a chance to warm up. Warm up typically consists of the population of database caches and application-level byte-code optimizations. As is the best practice, all performance data is captured after a warm up period to allow for these automatic optimizations to be initialized. Note that in production scenarios, database caches are persistent across database restart, meaning this warm up period is needed only for load tests.

Results

Details regarding the inputs and results data are included here for each test execution. All data here is empirical, captured before, during or after test run execution.

The Key Data section shows database row counts. Table definitions can be found in the appendix.

Select visualizations from JMeter are included directly in this document for each run. Full results from JMeter are available but not included here.

Five (5) Studies

APDEX Score: 0.996

Key Data

Table	Row Count
volunteer	10,001
volunteer_medical_condition	70,007
base_test_result	144,576
item_data	288,650
item_data_sample_audit_record	346,380
item_data_audit_record	578,053
Sum database table rows	2,800,162

Reports

50 Studies

APDEX Score: 0.996

Key Data

Table	Row Count
volunteer	50,001
volunteer_medical_condition	350,007
base_test_result	1,440,576
item_data	2,876,150
item_data_sample_audit_record	3,451,380
item_data_audit_record	5,759,803
Sum database table rows	26,945,065

Reports

100 Studies

APDEX Score: 0.995

Key Data

Table	Row Count
volunteer	75,001
volunteer_medical_condition	525,007
base_test_result	2,880,576
item_data	5,751,150
item_data_sample_audit_record	6,901,380
item_data_audit_record	11,517,303
Sum database table rows	53,442,557

Reports

250 Studies

APDEX Score: 0.997

Key Data

Table	Row Count
volunteer	125,001
volunteer_medical_condition	875,007
base_test_result	7,200,576
item_data	14,376,150
item_data_sample_audit_record	17,251,380
item_data_audit_record	28,789,803
Sum database table rows	132,509,822

Reports

500 Studies

APDEX Score: 0.996

Key Data

Table	Row Count
volunteer	200,001
volunteer_medical_condition	1,400,007
base_test_result	14,400,576
item_data	28,751,150
item_data_sample_audit_record	34,501,380
item_data_audit_record	57,577,303
Sum database table rows	264,147,372

Reports

1000 Studies

APDEX Score: 0.994

Key Data

Table	Row Count
volunteer	350,000
volunteer_medical_condition	2,449,995
base_test_result	28,800,576
item_data	57,501,150
item_data_sample_audit_record	69,001,380
item_data_audit_record	115,152,303
Sum database table rows	527,422,350

Reports

1500 Studies

APDEX Score: 0.989

Key Data

Table	Row Count
volunteer	400,000
volunteer_medical_condition	2,799,995
base_test_result	43,200,576
item_data	86,251,150
item_data_sample_audit_record	103,501,380
item_data_audit_record	172,727,304
Sum database table rows	788,997,461

Reports

2000 Studies

APDEX Score: 0.985

Key Data

Table	Row Count
volunteer	500,000
volunteer_medical_condition	3,499,995
base_test_result	57,600,576
item_data	115,001,150
item_data_sample_audit_record	138,001,380
item_data_audit_record	230,302,303
Sum database table rows	1,051,422,354

Reports

Test Components

Web Server

For the purpose of this test, only a single node was configured to host the ClinSpark application. Typical ClinSpark deployments have a minimum of two nodes and can scale infinitely horizontally as required.

Hosting provider: Amazon Web Services, region: US East
Server: instance type: m4.large, 2 vCPU, 8 GB RAM
Operating System: Linux 4.4.14-24.50.amzn1.x86_64 #1 SMP Fri Jun 24 19:56:04 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
Server type: Apache Tomcat/7.0.69
Java: 1.7.0_101, OpenJDK Runtime Environment (amzn-2.6.6.1.67.amzn1-x86_64 u101-b00), OpenJDK 64-Bit Server VM (build 24.95-b01, mixed mode)
JVM Settings: -Xms1024M -Xmx1024M -XX:MaxPermSize=512M -XX:+CMSClassUnloadingEnabled -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode

Database Server

Hosting provider: Amazon Web Services, region: US East
Server: instance type: db.r3.2xlarge, 8 vCPU, 61 GB RAM

Test Harness Client

In order to have more realistic latency times, the test harness was executed on a service offering outside of the web server's network and infrastructure. Details are as follows:

Hosting provider: Digital Ocean
Operating System: Linux 3.13.0-24-generic #46-Ubuntu SMP Thu Apr 10 19:11:08 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
Java: 1.7.0_65, OpenJDK Runtime Environment (IcedTea 2.5.2) (7u65-2.5.2-3~14.04), OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)

JMeter Script

Version: 3.0 r1743807

Analysis table definitions

volunteer - volunteers in the system that are not necessarily enrolled in any given study.
volunteer_medical_condition - a join table that combines volunteer records and dynamically associated medical conditions.
base_test_result - lab results
item_data - a data point associated with a form / item group. Examples are QT interval, PK capture time, etc.
item_data_sample_audit_record - during the process of study samples, audit records as established when users perform each step.
item_data_audit_record - audit records for item data are generated at creation time and each time a user interacts with a form in which the item exists.
Sum database table rows - this is a sum of all of the ClinSpark table rows. It is not the sum of the rows presented in the analysis section.

Miscellaneous

Linux operating system information obtained via shell command: uname -a
Tomcat information obtained via shell command: java -cp $TOMCAT_HOME/lib/catalina.jar org.apache.catalina.util.ServerInfo
Java version obtained via shell command: java -version
JMeter version obtained via shell command: $JMETER_HOME/bin/jmeter --version
Before each test was run, a 'warm up' of the database was initiated. This was done by way of executing the test scripts and allowing a user to click through the application. Approximate warm up time was ten minutes of test execution.
Each test assumes that no more than 50 studies are in an 'active' state.

Postscript

This testing exercise was executed when ClinSpark was being pitched in 2016, to a prospective customer that was also evaluating another eSource platform. We were aware that this competitor eSource platform, from an established multinational vendor, was poorly performant and we were happy to go head-to-head to demonstrate ClinSpark’s designed-in performance characteristics.

Internally, our name for this competitive endeavour was ‘the bakeoff’.

Our opposition never showed up.

Performance, Load and Scaling Characteristics

Preface

Introduction

Related

Experiment

Introduction

Executive Summary

Average Response Times Stay Small

Response Time Distributions

Vertically Scaling the Database

Approach

Study Overview

Data Basics

Testing Automation and Execution

Pre-Test Environment Warm Up

Results

Five (5) Studies

Key Data

Reports

50 Studies

Key Data

Reports

100 Studies

Key Data

Reports

250 Studies

Key Data

Reports

500 Studies

Key Data

Reports

1000 Studies

Key Data

Reports

1500 Studies

Key Data

Reports

2000 Studies

Key Data

Reports

Test Components

Web Server

Database Server

Test Harness Client

JMeter Script

Analysis table definitions

Miscellaneous

Postscript