Table of Contents | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
Summary
Certain reports available through ClinSpark are provided in CSV (Comma Separated Value) format. CSV is a widely used and flexible format that relies on a delimiter between reported values. CSV is often preferred over XLS (Microsoft Excel) due to inherent limitations with how data can be reported into Excel worksheets. This article will explain these limitations and provide suggestions on how to work with CSV files provided by ClinSpark.
Row and File Size Limits
Excel has a limit to the number of rows and columns that can be used per worksheet, which is exceeded quickly with certain study/device data sets. CSV files however can hold many more rows of data than XLS, and do not have these issues. Additionally there are file size limits with Excel workbooks which are difficult to design and implement certain reports around. Formatting requirements can also cause challenges with mixing various types of data. Customers can learn more about Excel file limits from this Microsoft support article.
Tips for using Excel and CSV files
The default mechanism of opening CSV files in Excel through local file folders or a browser directly may be problematic due to some assumed default behavior when handling imported data. One example is how the formatting of dates & times are changed if there are differences in the format in CSV, and the interpreted datetime in Excel. Another common issue we see are with handling of numeric values in CSV that contain a leading sequence of numbers such a as zero (0
); where upon opening the file in Excel the zeros appear to be stripped or removed. Excel may not always properly handle the import of the CSV contents for expected review based on these behaviors.
...
Example Workflow
The following is an example on how to import a transfer data set using the Clinical Data Text (delimited) report type, which outputs series of CSV files. This is a unique example, because unlike the CSV file format suggests where the data is separated by comma, the delimiter is a pipe (vertical bar, |
).
...
For this report, the data is separated into multiple files. These can be imported/reviewed individually, or together into the same Excel workbook.
...
Using a basic text editor (not Excel; examples are Notepad which is built in to Windows, or the more powerful and free open source Notepad++), we can inspect the contents of the files to confirm that they contain delimited values using a pipe symbol. Other characters, such as quotations, are used in data fields as well.
...
The import wizard allows you to clearly define what the delimiters are. In this example, it’s necessary to specify that the pipe ( |
) is the delimiter, and the text qualifier ( “ "
).
...
The last step is important to consider when there are mixed data types in the file. For example, the handling of datetime
values may be more appropriate to change into a region specific date format. Or, leave the datetimes
unaltered from the source file and preserve as ‘Text’'Text'. Each column of the reported data should be reviewed to ensure the proper data format is applied.
...
When the import is complete, the remaining resulting view in Excel is more appropriate than simply opening it from the file browser. Not only have the CSV contents been properly reviewed during the import process, but correct data types have been set and applied to the workbook, making the data much easier to review.
...