Technical Insight

Data Quality Management Practical Tests and Scripted Solutions

There are several simple steps you can take to test your data's quality.

 Ideally, this should be done immediately after defining a file in Arbutus Analyzer and before beginning your analysis. Running the tests below will document the results in the Command Log. It's a best practice to use a script to execute your tests and have the script direct the results to an output file. This can become part of your workpapers to demonstrate that the tests have been performed.

First Level Testing


Control Totals - Ask your data provider to inform you how many records were in the source data and what the control or hash totals were for numeric fields:

  • Count command to document the record count
  • Total command on key numeric fields for control/hash totals

Uniqueness of Records


It's possible that a single original record can be written more than once to a file due to processing errors.

To test for record uniqueness, use the Duplicates command, selecting all the physical fields in the "Field(s) to test for Duplicates" section of the command dialog:
Duplicates ON <field list> To "Payments_2019_DupeRecs" Open Presort

DQManagement - Duplicates

Data Corruption

Data can become corrupted in many ways as it is being processed. Performing an extraction from a database, zipping up a file, and even sending it as an e-mail attachment are activities that can cause corruption. Numeric and date fields are particularly vulnerable.

Run the Verify command against all the physical fields. The output will tell you if any evidence of corruption has been detected:

  • Non-numeric characters in numeric fields
  • Invalid/blank dates
  • Invisible/non-printing characters (such as tabs)
data validity errors image

Field Level Testing

Numeric and date fields are the prime candidates for quality testing. They are indicators of the scope (time period) and materiality of the data. Key character fields should also be tested.

Numeric Fields
The Statistics command provides a wealth of information for both numeric and date fields. For numeric fields, it can rapidly display the following results:
  •  Number/Total/Average
  • Number/Total/Average of positives
  • Number of zeros
  • Number/Total/Average of negatives
  • Absolute total
  • Range
  • Standard deviation
  • Top X
Field Level Testing

It's always useful to review the Top X results to help you make your decision to test for outliers.

Date Fields

The Statistics command again provides significant results:

  • Blank dates
  • Invalid dates
  • Weekend dates
  • Range
  • Top X dates
Date Fields

Invalid dates are those which contain content that does not conform to the field's date format. Details of the invalid/blank dates' contents can be displayed by the Verify command. Weekend dates may be of interest to tests of manual journal entries, where weekend or holiday transactions are considered high-risk. The Range result will immediately confirm the span of the dates; a range greater than 365 will tell you that the records cover more than a one-year period. Reviewing the Top X results can identify the existence of outliers. Test logically-related date fields with filters to determine if there are anomalies, such as an invoice date coming after a payment date in a given record. Use a filter with the Count command to document their existence: Count If Invoice_Date > Payment_Date

 

Testing and Scripted Solutions

 

Analytics By Business Process

Learn more about various analytics tests to meet business needs.

Analytics By Business Process

Analytics Test by popularity
 

Arbutus Technical Insights

Explore more of our Technical Resources

Technical Insights Main

Request Trial 

Fill in your details and get started with a free trial on us

 

Try Arbutus