Summary
Certain reports available through ClinSpark are provided in CSV (Comma Separated Value) format. CSV is a widely used and flexible format that relies on a delimiter between reported values. CSV is often preferred over XLS (Microsoft Excel) due to inherent limitations with how data can be reported into Excel worksheets. This article will explain these limitations and provide suggestions on how to work with CSV files provided by ClinSpark.
Row and File Size Limits
Excel has a limit to the number of rows and columns that can be used per worksheet, which is exceeded quickly with certain study/device data sets. CSV files however can hold many more rows of data than XLS, and do not have these issues. Additionally there are file size limits with Excel workbooks which are difficult to design and implement certain reports around. Formatting requirements can also cause challenges with mixing various types of transactional data (for example, laboratory reported test data and application audit records). Customers can learn more about Excel file limits from this Microsoft support article.
Tips for using Excel and CSV files
The default mechanism of simply opening CSV files in Excel through ‘double clicking’ the available file may be problematic due to some assumed default behavior when handling imported data. One example is how the formatting of dates & times are changed if there are differences in the reported datetime in CSV, and the interpreted datetime in Excel. Another common issue we see are with handling of numeric values in CSV that contain a leading sequence of numbers such a zero; where upon opening the file in Excel the zeros appear to be stripped or removed. Excel may not always properly handle the import of the CSV contents for expected review based on these behaviors.
We recommend that customers follow Microsoft’s suggested practice for importing the contents of CSV files into an existing or new workbook using Import tools. These help articles offer suggestions on how to accomplish this workflow.
https://support.microsoft.com/en-us/office/text-import-wizard-c5b02af6-fda1-4440-899f-f78bafe41857
Example Workflow
The following is an example on how to import a transfer data report using the Clinical Data Text (delimited) format, which outputs as a CSV file format. This is a unique example, because unlike the CSV file format suggests where the data is separated by comma, the delimiter is a pipe (vertical bar).
For this report, the data is separated into multiple files. These can be imported/reviewed individually, or together in the same Excel workbook.
Using a basic text editor (not Excel!), we can inspect the contents of the files to confirm that they are delimited values using a pipe symbol. There are other characters, such as quotations, used in fields as well.
If opening these files in Excel from a local folder, Excel automatically strips the quotations from one of the fields (STUDYID) and modifies the field contents.
Additionally, it has placed all of the contents of the file into first cell (A) of the Workbook, instead of separating the file contents into individual cells (as typically expected).
However using the Import Wizard, steps can be taken to ensure the data from the CSV is properly imported.
To start, users can review the contents of the CSV ‘preview’ and determine if there is a better way to represent the data types other than delimited.
The import wizard allows you to clearly define what the delimiters are. In this example, it’s necessary to specify that the pipe ( | ) is the delimiter, and the text qualifier ( “ ).
The last step is important to consider when there are mixed data types in the file. For example, the handling of datetime values may be more appropriate to format into a region specific date value. Each column of the reported data should be reviewed to ensure the proper data format is applied.
When the import is complete, the remaining view in Excel is more appropriate than before. Not only has the CSV contents been properly reviewed, but correct data types have been set and applied to the workbook.