...
As these Linux images are hardened, continuously and automatically patched, unreachable without an SSH connection and protected by a firewall, no additional anti-malware measures are installed.
Backup
Info |
---|
Key takeaways: Backup is automatic, occurs daily and are retained for 35 days. In addition, ClinSpark is constantly streaming change logs to support the following stated RPO - https://foundryhealth.atlassian.net/wiki/spaces/DOCS/pages/3708420496/ClinSpark+Application+SLA#RPO---Recovery-Point-Objective and an RTO https://foundryhealth.atlassian.net/wiki/spaces/DOCS/pages/3708420496/ClinSpark+Application+SLA#RTO---Recovery-Time-Objective times. Evidence of backups can be provided upon request (service desk ticket please!) in the form of screenshots of the available snapshots in the AWS console. |
All customer data is stored in AWS RDS instances. Application servers do not store any customer data, only configuration. As such this topic is limited in scope to how RDS supports backups and recovery.
First Line of Defense
RDS is by design a service providing the highest level of data backups. All customer PROD instances use RDS instances in a Multiple Availability Zone configuration. The relevant components are shown in this diagram:
...
Application server instances interact with the RDS Master instance at all times. However each transaction synchronously updates the underlying physical storage, which in Aurora is striped across 3 separate physical Availability Zones. This replication forms the primary line of defense for backups. There is always a complete copy of the application database ready in a separate physical location. When a failure of any sort occurs to the Primary database, the infrastructure automatically shifts all application traffic to the Standby, which is now promoted to the role of Master with no loss of data.
Second Line of Defense
The second line of defense is snapshots stored in Amazon S3 storage. Database transactions produce records in logs. These records are comprehensive. These logs are streamed continuously to S3. As documented by Amazon, this stream of backup data is sufficient to restore a completely new replacement instance of the database to a point in time of within 5 minutes of when a disaster occurred. S3 storage is configured to itself be backed up across separate geographic regions.
Collectively, this means that all customer data is backed up in real time to two physically separate databases in 3 physical datacenter locations, with automated failover. In addition, all customer data is stored in separate offsite storage to within a 5 minute window.
Restore
In the event that the Primary database fails, the Standby instance is automatically promoted to Master. This is handled transparently and automatically. There will be an interruption of inflight transactions when the Master goes down. However the system automatically recovers, and the restore procedure is handled without human intervention.
In the event that both the Primary and Secondary databases go down, RDS will automatically provision replacements and swap them in within 15 minutes. In addition, new instances could be manually created anywhere in the AWS cloud. These instances can be loaded with the snapshots stored in S3. This scenario does cause an interruption in service, and manual intervention. It is highly unlikely, given the high platform availability of a Multi Availability Zone deployment. However should it occur, IQVIA would follow the procedure for building an environment in another region or Availability Zone, restore the data using the previously referenced procedure, and restore service using the new instances.
Due to the backup processes described above, the Engineering team does not formally test restore procedures.
Application Development and Support Staff
...