The Arbutus technology contains a wide variety of commands specifically designed for data analysis and manipulation. Commands include:
Data Output Commands
- REPORT - provides for basic reporting, with automatic formatting
- EXPORT - creates data in most popular PC-based formats
Data Manipulation Commands
- EXTRACT - creates a subset of a table, not unlike SELECT WHERE
- SUMMARIZE - creates subtotals by key value, not unlike GROUP BY
- INDEX - logically re-arranges the data, based on one or more keys
- SORT - creates a physically re-arranged copy the data, based on one or more keys
- RELATION - allows multiple files to be connected in a “star schema” style
- JOIN - combines two dissimilar files based on common key(s)
- MERGE - combines two files with identical structures, based on common key(s)
- SAMPLE - creates a statistical subset of a table
Data Analysis Commands
- CLASSIFY - one of the most powerful analytic commands in the Arbutus command set. It allows a table to be grouped and totaled on a key field, even when the table is not physically arranged in that order. It does this without arranging the table first, by maintaining the totals independently from the file. The result is most Sort/Summarize operations can be replaced by a simple Classify.
- CROSSTABULATE - extends CLASSIFY to a two-dimensional slice, without sorting
- COUNT - determines the number of rows that match any criteria
- TOTAL - accumulate numeric column(s) that match any criteria
- STATISTICS - provides an overview of numeric column(s)
- STRATIFY - determines the distribution of values for numeric column(s)
- AGE - provides distribution information based on dates
- BENFORD - provides a specialized statistical analysis of the data
- SEQUENCE - confirms that a table is arranged in the order it is expected to be
- GAPS - identifies any gaps in sequences that are expected to be complete
- DUPLICATES - identifies any duplicates in sequences that are expected to be unique values
- VERIFY - tests for physical data corruption in source files
.