Get StartedInformation ResourcesView WebinarTest DriveRequest a Price Quote

Data Analysis Commands

Arbutus contains a wide variety of commands, specifically designed for data analysis and manipulation. A summary of the major commands includes:

Data Output Commands

  • REPORT provides for basic reporting, with automatic formatting
  • EXPORT creates data in most popular PC based formats

Data Manipulation Commands

  • EXTRACT creates a subset of a table, not unlike SELECT WHERE
  • SUMMARIZE creates subtotals by key value, not unlike GROUP BY
  • INDEX logically re-arranges the data, based on one or more keys
  • SORT creates a physically re-arranged copy the data, based on one or more keys
  • RELATION allows multiple files to be connected in a “star schema” style
  • JOIN combines two dissimilar files based on common key(s)
  • MERGE combines two files with identical structures, based on common key(s)
  • SAMPLE creates a statistical subset of a table

Data Analysis Commands

  • CLASSIFY is one of the most powerful analytic commands in the Arbutus command set. It allows a table to be grouped and totaled on a key field, even when the table is not physically arranged in that order. It does this without arranging the table first, by maintaining the totals independently from the file. The result is most Sort/Summarize operations can be replaced by a simple Classify.
  • CROSSTABULATE extends CLASSIFY to a two-dimensional slice, without sorting
  • COUNT determines the number of rows that match any criteria
  • TOTAL accumulate numeric column(s) that match any criteria
  • STATISTICS provides an overview of numeric column(s)
  • STRATIFY determines the distribution of values for numeric column(s)
  • AGE provides distribution information based on dates
  • BENFORD provides a specialized statistical analysis of the data
  • SEQUENCE confirms that a table is arranged in the order it is expected to be
  • GAPS identifies any gaps in sequences that are expected to be complete
  • DUPLICATES identifies any duplicates in sequences that are expected to be unique values
  • VERIFY tests for physical data corruption in source files