Why Are DIAdem DPP Processed Files Larger Than Raw Files?

Updated Jun 22, 2023

Reported In

Software

  • DIAdem
  • SystemLink

Issue Details

  • I have created a Data Preparation Procedure (.DPP) file in DIAdem. When testing it, I see that my output file is larger than my input file. What causes this?
  • I am using a custom .DPP file for my SystemLink Data Preprocessor Instance. However, all of my TDMS processed files are 2-5 times larger than my raw files. Why is this happening?

Solution

The SystemLink Data Preparation feature loads raw data files into a DIAdem worker using a matching DataPlugin. The size of the resulting file depends on the following:
  • Metadata added by the DPP.
    • If Harmonize Property Identifiers has been enabled in the Replace Identifiers tab, the DPP will add property metadata. This is because the properties in the raw file are being modified to use a golden/harmonized naming format.
    • If Harmonize Property Values has been enabled in the Replace Values tab, this also adds metadata. This is because the name of channels or groups has been changed to use a golden/harmonized format.
    • If Unit Conversion has been enabled in the Convert Units tab, metadata is added to the output file because the Unit property is being modified to conform to a golden standard.
    • If Calculate Statistical Characteristics Values is enabled in the Statistics tab, the overall metadata increases because additional statistical properties are being added to the file.
  • Validation and Verification scripts.
    • If Validation and Verification is enabled in the V & V tab, the output file will be larger because this feature enables a freeform script that can do anything. Depending on what calculations and analysis are inside this script, the result file could be significantly larger.
  • TDMS Fragmentation.
    • TDMS files are designed for efficient data streaming. The file achieves efficiency by expecting the same data to be written each time the file is updated. If the data is changed, the file will become fragmented, which results in larger file sizes and slower reading speeds.
    • To avoid fragmentation:
      1. Try to write the same data chunks for each file write.
      2. DIAdem contains a command that allows for defragmentation: TdmsFileDefrag. Consider including this in a V & V script as a part of your DPP.
  • File format conversion.
    • DIAdem tries to save channels in the output file with the same IEEE data type that they were stored in from the input file. If the data is saved in a non-binary format, or if the input is not in a binary file format, the saved result will not be the same data type as loaded data type.
    • DIAdem will expand channels into memory as a DBL (float64) data type, which is 8 bytes long.
    • For example, if the input file stores data as I16 values (2 bytes), the output file is expected to be 4 times larger (due to saving the data as DBLs).

Additional Information

To optimize the size of the DPP output file, refer to Optimizing the Size of a DPP Output File in DIAdem.