© 2024 IQVIA - All Rights Reserved

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Overview

ClinSpark supports data exports into SAS XPT v5 and v8 format compatible with FDA reporting guidelines.

With XPT all character data are stored in ASCII, regardless of the operating system and XPT file format. ClinSpark natively supports UTF-8 character in all data collection interfaces. For instance, Lab Data result might come into ClinSpark encoded with UTF-8 characters. In order to comply with XPT standard while exporting data from ClinSpark we apply special conversion procedures. This document describes these conversion processes in details.

Automatic characters conversion

ClinSpark automatically normalize all UTF-8 characters using NFD Unicode form. The normalization converts characters with diacritical marks, change all letters case, decompose ligatures, or convert half-width katakana characters to full-width characters. One part of that processing is to remove accents, which is language and charset specific. See NFD Unicode normalization form for a detailed specification.

Our Normalizer also decomposes the original characters into a combination of a base character and a diacritic sign (this could be multiple signs in different languages). á, é and í have the same sign: 0301 for marking the ' accent. Our processing engine will match all such diacritic codes and we will replace them with an empty string.

Customer controlled characters conversion

In addition to an automatic conversion ClinSpark allows customers to specify their own mapping rules. The following tables defined “default” conversion rules:

Original Character

Desired Outcome

“β”

“B”

“ß”

“B”

“µ”

“u”

“²”

“2”

Automatic characters filtering

Ones automatic conversion and customer defined conversion took place ClinSpark performs final filtering steps:

  • strips off all non-ASCII characters

  • erases all the ASCII control characters

  • removes non-printable characters from Unicode

  • removes leading and trailing whitespaces

Conversion Audit

In cases where conversion took place ClinSpark generates audit log. Audit is captured in log final named based on the following rules:

"${domainFileName}-log.csv"

where ${domainFileName} is a reporting data domain defined for each item during study design. Audit file is then placed right by where XPT file is, example contains of the TransferData-XPT_${date}zip:

AE.xpt
AE-log.csv

Note: an audit file is generated ONLY if data conversion took place, i.e. study data were modified during an export

  • No labels