© 2024 IQVIA - All Rights Reserved

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

Preface

ClinSpark is designed to be performant with the goal of…

always fast, regardless of user load

We expect to see average response times in properly sized production environments around 250 ms.

Introduction

We don’t typically engage in performance testing exercises with customers. If this is something you need, please talk to your project manager and they can arrange to have such an exercise scoped and costed.

Related


Experiment

Introduction

High performance has always been a key design goal for ClinSpark. This foundational concern has driven every aspect of the system's design. It permeates through the application code, the database design and the deployment infrastructure. Customers tend to be drawn to ClinSpark because usage of the system has the potential to transform business and clinical operations. But, precisely because it impacts so many aspects of the clinical ecosystem, performance issues over time could potentially be disastrous for customers.

We are including here a testing approach, with related results and analysis from an exercise that we performed to address specifically the question of whether ClinSpark performance will degrade over time. System demonstrations have repeatedly shown that ClinSpark is fast with default datasets. This effort presents the answer to the question about what will happen to performance when large multi-site customers use ClinSpark continuously for years, with the accumulated data from past studies available in a single database.

A key part of our testing effort entailed creating a series of progressively large datasets. As described in detail below, each dataset contains a volume of data representing a particular number of studies, ranging from 5 to 2000. A test script was created that simulates a user interacting with a large coverage area of ClinSpark in an automated, repeatable fashion. Then a series of tests were conducted where this user activity was applied over separate instances of ClinSpark initialized with each of the sample databases. Each test recorded minute details about system response times.

A key goal for this test was to identify application functionality that would not perform acceptably as the size of the database grows over time. The primary aim was not an attempt to apply a massive volume of requests concurrently, but rather the focus was to apply a relatively constant number of requests against a massively increasing volume of data. Also note that while this exercise isn't truly a load test, the request volume applied was still roughly 20 requests per second. This can be observed in detail in the zip attachments; look for the charts on throughput.

Raw data outputs from JMeter are available for the experiment described here but have not been included.

Executive Summary

This exercise objectively demonstrates that ClinSpark maintains a very low response time even with large volumes of accumulated study data in the database.

Average Response Times Stay Small

The aggregate average response time for ClinSpark requests increases linearly with increasing historical study volume. However, even with 2000 studies in the ClinSpark database, the average response time did not exceed 289 milliseconds.

image-20240308-181247.png

ClinSpark pages may issue multiple requests concurrently using AJAX, in addition to the loading of the primary page. So these results do not correlate one-to-one with page view times. However, they do mean that the typical user with a typical site network connection should have the majority of pages ready for use with sub-second response times.

Response Time Distributions

The averages, of course, only show a blend of all transaction types. Which transactions are fast and which are slower matters.

The below chart is taken from the Response Time Distribution data collected for each execution of the test. This chart groups response counts into time ranges and comes from the run with 1000 studies. It shows that the majority of the transactions had responses rendered in less than a second, but a few outliers took as many as 10. Note the color of the bars, which indicates transaction type.

image-20240308-181510.png

This next chart on response distribution shows a per-transaction view of the same data. This makes it easy to spot the outliers, the slower transactions in the far right of the curve.

image-20240308-181604.png

The small numbers of transactions that exceed a 2-second threshold are not common high-frequency user operations. The most common user operations fall within the sub-second grouping.

Vertically Scaling the Database

As detailed below, the database used for this series of tests is a memory-optimized "db.r3.2xlarge" instance. This database has 8 virtual CPUs and 61 GB of RAM. This database proved more than adequate even with the test where over 1 billion rows of test data were present. Our preferred deployment provider, Amazon Web Services, currently has database servers four times as powerful, with 32 virtual CPUs and 244 GB of RAM, which at any point in the future can be swapped in should an environment require additional performance. This is a key point, because the database is the one part of the architecture that can only be vertically scaled.

Approach

A base study was designed to maximize a variety of data types that are commonly found in typical clinical trials. While although the protocol is not overly realistic, it does allow for establishing a solid foundation for which testing can occur. Once the protocol was built into the system, a manual execution of the study was performed. With the base study results, a program was then developed that allowed for copying Y subjects into X studies. During the copy program's execution, database snapshots were taken at different study milestones to allow for future testing: 5, 50, 100, 250, etc. Finally, an automated test script was developed that allowed for capturing objective server metrics for analysis.

Study Overview

  • 50 subjects per study

  • 10 study events per protocol: Days 1-10

  • Each study day is identical:

    • 23 PK events

      • Basic sample path of centrifuge, transfer to two aliquots, freeze

      • Each transfer tube is associated with a sample container and later associated with a configured shipment

    • 4 ECGs: standard Mortara intervals and interpretation fields exposed in the form

    • 4 vitals signs: Diastolic, Systolic, MAP, Rate, and Temperature

    • 1 dosing event

    • 1 PE with three repeating groups

      • Standard PE form from CDISC CDASH

    • Hematology, Clinical Chemistry, Urinalysis

      • Each test to have simulated test results included

      • Each result to be reviewed in the system

Data Basics

  • With each study milestone, the copy program increases the number of volunteers, with a final goal of 500,000

    • Each managed volunteer will randomly have 7 medical conditions associated

    • Date of birth is randomly created, and will be in the range of 12-SEP-1926 to 12-SEP-1996

    • Height and weight are randomly created with a range of 1.3 - 2.2m and 34 - 181kg respective

      • This level of randomization created a wide spectrum BMIs

    • Address, city, first name, middle name, last name, postal code, email, and phone numbers were randomly obtained using an online data generator

  • 11 contraception types

  • 155 medical conditions

  • 50 volunteer regions (US states)

  • 5 tobacco types

  • Mock ECG and mock Vitals 'devices'

  • Each ECG and vitals test leverages this mock interface, and thus the results are identical across each time-point

  • Each study has 75 appointment windows

  • 900 users

  • Usernames are 001 to 900

  • Password hardcoded as [REDACTED]

  • Each user assigned an 'admin' role

Testing Automation and Execution

A JMeter (http://jmeter.apache.org) test script was developed in order to simulate users interacting with ClinSpark against varying data loads. The script allowed for specifying URL endpoints, and any corresponding data that was required to complete a given action (ie entering volunteer details in order to perform a search). The goal was to interact with each ClinSpark application component, all while automatically collecting critical performance metrics. Leveraging JMeter empowered execution to be consistent in terms of the tests conducted, and it also allowed for establishing desired levels of concurrency. For the tests performed, the script was configured to run with five concurrent threads, a ramp up time of 50 seconds, and a loop count of 15. Because the script does not include pause times in between sampler invocations, the test simulated a high level of throughput in relation to system usage. Upon completion of each test, the output from the framework was saved and analyzed. Tests were explicitly executed on a computer system and network outside of the application's deployment infrastructure.

The script was run against differing database sizes as described in the Results section that follows. For each test run, very detailed reports are generated and are available for download. High-level summaries of each test run are found with their corresponding tests in the sections that follow. APDEX (http://www.apdex.org) is a measure of user experience based on user wait times. The configured APDEX satisfaction and tolerance thresholds have been defined as 3 and 5 seconds respective.

Pre-Test Environment Warm Up

A freshly restarted database and application environment will always be slower than one that has a chance to warm up. Warm up typically consists of the population of database caches and application-level byte-code optimizations. As is the best practice, all performance data is captured after a warm up period to allow for these automatic optimizations to be initialized. Note that in production scenarios, database caches are persistent across database restart, meaning this warm up period is needed only for load tests.

Results

Details regarding the inputs and results data are included here for each test execution. All data here is empirical, captured before, during or after test run execution.

The Key Data section shows database row counts. Table definitions can be found in the appendix.

Select visualizations from JMeter are included directly in this document for each run. Full results from JMeter are available but not included here.

Five (5) Studies

  • APDEX Score: 0.996

Key Data

Table

Row Count

volunteer

10,001

volunteer_medical_condition

70,007

base_test_result

144,576

item_data

288,650

item_data_sample_audit_record

346,380

item_data_audit_record

578,053

Sum database table rows

2,800,162

Reports

image-20240308-184435.pngimage-20240308-184527.png

50 Studies

  • APDEX Score: 0.996

Key Data

Table

Row Count

volunteer

50,001

volunteer_medical_condition

350,007

base_test_result

1,440,576

item_data

2,876,150

item_data_sample_audit_record

3,451,380

item_data_audit_record

5,759,803

Sum database table rows

26,945,065

Reports

image-20240308-185141.pngimage-20240308-185216.png

100 Studies

  • APDEX Score: 0.995

Key Data

Table

Row Count

volunteer

75,001

volunteer_medical_condition

525,007

base_test_result

2,880,576

item_data

5,751,150

item_data_sample_audit_record

6,901,380

item_data_audit_record

11,517,303

Sum database table rows

53,442,557

Reports

image-20240308-185507.pngimage-20240308-185542.png

250 Studies

  • APDEX Score: 0.997

Key Data

Table

Row Count

volunteer

125,001

volunteer_medical_condition

875,007

base_test_result

7,200,576

item_data

14,376,150

item_data_sample_audit_record

17,251,380

item_data_audit_record

28,789,803

Sum database table rows

132,509,822

Reports

image-20240308-185928.pngimage-20240308-190000.png

500 Studies

  • APDEX Score: 0.996

Key Data

Table

Row Count

volunteer

200,001

volunteer_medical_condition

1,400,007

base_test_result

14,400,576

item_data

28,751,150

item_data_sample_audit_record

34,501,380

item_data_audit_record

57,577,303

Sum database table rows

264,147,372

Reports

image-20240308-190229.pngimage-20240308-190305.png

1000 Studies

  • APDEX Score: 0.994

Key Data

Table

Row Count

volunteer

350,000

volunteer_medical_condition

2,449,995

base_test_result

28,800,576

item_data

57,501,150

item_data_sample_audit_record

69,001,380

item_data_audit_record

115,152,303

Sum database table rows

527,422,350

Reports

image-20240308-190510.pngimage-20240308-190549.png

1500 Studies

  • APDEX Score: 0.989

Key Data

Table

Row Count

volunteer

400,000

volunteer_medical_condition

2,799,995

base_test_result

43,200,576

item_data

86,251,150

item_data_sample_audit_record

103,501,380

item_data_audit_record

172,727,304

Sum database table rows

788,997,461

Reports

image-20240308-190742.pngimage-20240308-190816.png

2000 Studies

  • APDEX Score: 0.985

Key Data

Table

Row Count

volunteer

500,000

volunteer_medical_condition

3,499,995

base_test_result

57,600,576

item_data

115,001,150

item_data_sample_audit_record

138,001,380

item_data_audit_record

230,302,303

Sum database table rows

1,051,422,354

Reports

image-20240308-191023.pngimage-20240308-191051.png

Test Components

Web Server

For the purpose of this test, only a single node was configured to host the ClinSpark application. Typical ClinSpark deployments have a minimum of two nodes and can scale infinitely horizontally as required.

  • Hosting provider: Amazon Web Services, region: US East

  • Server: instance type: m4.large, 2 vCPU, 8 GB RAM

  • Operating System: Linux 4.4.14-24.50.amzn1.x86_64 #1 SMP Fri Jun 24 19:56:04 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

  • Server type: Apache Tomcat/7.0.69

  • Java: 1.7.0_101, OpenJDK Runtime Environment (amzn-2.6.6.1.67.amzn1-x86_64 u101-b00), OpenJDK 64-Bit Server VM (build 24.95-b01, mixed mode)

  • JVM Settings: -Xms1024M -Xmx1024M -XX:MaxPermSize=512M -XX:+CMSClassUnloadingEnabled -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode

Database Server

  • Hosting provider: Amazon Web Services, region: US East

  • Server: instance type: db.r3.2xlarge, 8 vCPU, 61 GB RAM

Test Harness Client

In order to have more realistic latency times, the test harness was executed on a service offering outside of the web server's network and infrastructure. Details are as follows:

  • Hosting provider: Digital Ocean

  • Operating System: Linux 3.13.0-24-generic #46-Ubuntu SMP Thu Apr 10 19:11:08 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

  • Java: 1.7.0_65, OpenJDK Runtime Environment (IcedTea 2.5.2) (7u65-2.5.2-3~14.04), OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)

JMeter Script

  • Version: 3.0 r1743807

Analysis table definitions

  • volunteer - volunteers in the system that are not necessarily enrolled in any given study.

  • volunteer_medical_condition - a join table that combines volunteer records and dynamically associated medical conditions.

  • base_test_result - lab results

  • item_data - a data point associated with a form / item group. Examples are QT interval, PK capture time, etc.

  • item_data_sample_audit_record - during the process of study samples, audit records as established when users perform each step.

  • item_data_audit_record - audit records for item data are generated at creation time and each time a user interacts with a form in which the item exists.

  • Sum database table rows - this is a sum of all of the ClinSpark table rows. It is not the sum of the rows presented in the analysis section.

Miscellaneous

  • Linux operating system information obtained via shell command: uname -a

  • Tomcat information obtained via shell command: java -cp $TOMCAT_HOME/lib/catalina.jar org.apache.catalina.util.ServerInfo

  • Java version obtained via shell command: java -version

  • JMeter version obtained via shell command: $JMETER_HOME/bin/jmeter --version

  • Before each test was run, a 'warm up' of the database was initiated. This was done by way of executing the test scripts and allowing a user to click through the application. Approximate warm up time was ten minutes of test execution.

  • Each test assumes that no more than 50 studies are in an 'active' state.

Postscript

This testing exercise was executed when ClinSpark was being pitched in 2016 to a prospective customer that was also evaluating another eSource platform. We were aware that this platform, from an established multinational vendor, was poorly performant and we were happy to go head-to-head to demonstrate ClinSpark’s designed-in performance characteristics.

Internally, our name for this competitive endeavour was ‘the bakeoff’.

Our opposition never showed up.

  • No labels