Overview
ClinSpark supports data exports into SAS XPT v5 and v8 format compatible with FDA reporting guidelines.
With XPT all character data are stored in ASCII, regardless of the operating system and XPT file format. ClinSpark natively supports UTF-8 character in all data collection interfaces. For instance, Lab Data result might come into ClinSpark encoded with UTF-8 characters. In order to comply with XPT standard while exporting data from ClinSpark we apply special conversion procedures. This document describes these conversion processes in details.
Automatic characters conversion
ClinSpark automatically normalize all UTF-8 characters using NFD Unicode form. The normalization converts characters with diacritical marks, change all letters case, decompose ligatures, or convert half-width katakana characters to full-width characters. One part of that processing is to remove accents, which is language and charset specific. See NFD Unicode normalization form for a detailed specification.
Our Normalizer also decomposes the original characters into a combination of a base character and a diacritic sign (this could be multiple signs in different languages). á, é and í have the same sign: 0301 for marking the ' accent. Our processing engine will match all such diacritic codes and we will replace them with an empty string.
Customer controlled characters conversion
In addition to an automatic conversion ClinSpark allows customers to specify their own mapping rules. The following tables defined “default” conversion rules:
Original Character | Desired Outcome |
---|---|
“β” | “B” |
“ß” | “B” |
“µ” | “u” |
“²” | “2” |
Automatic characters filtering
Ones automatic conversion and customer defined conversion took place ClinSpark performs final filtering steps:
strips off all non-ASCII characters
erases all the ASCII control characters
removes non-printable characters from Unicode
removes leading and trailing whitespaces
Conversion Audit
In cases where conversion took place ClinSpark generates audit log. Audit is captured in log final named based on the following rules:
"${domainFileName}-log.csv"
where ${domainFileName}
is a reporting data domain defined for each item during study design. Audit file is then placed right by where XPT file is, example contains of the TransferData-XPT_${date}zip:
AE.xpt AE-log.csv
Note: an audit file is generated ONLY if data conversion took place, i.e. study data were modified during an export