Case Study: Integrate

The Problem

The customer, a large public utility, initiated a project to create a data mart populated with data from their customer master file. This file was their primary legacy system, and contained information on every account and activity in the region. Access to this data has been provided in the past by special purpose extraction routines, but these had proven too costly and inflexible to meet the ongoing needs of the corporation. In addition, the data and reports produced by the extraction routines did not reconcile well with the production reporting processes. As such, the uses to which the data could be applied were very limited.

The customer master file had evolved over a period of decades to meet the changing needs of the organization, resulting in a very large, very complex file (description) that bore little resemblance to the original design.

A number of the largest ETL vendors had been invited to make proposals, but in each case the recommended solution to the data access problem was a similarly complex PL/1 program. Because of ongoing maintenance issues of a custom program, the significant time span for implementation and the high costs quoted in the proposals, they had not yet proceeded with the project and were re-investigating alternatives, looking for a better way.

The Solution

The Arbutus technology was chosen after demonstrating the ability to directly read their complex master file, providing the project team with the tools and confidence they needed to proceed. The Arbutus solution could do the job, and at a fraction of the cost of other proposals.

The Process

Working with Arbutus technology specialists, the team began the discovery phase, defining and profiling the data. This step turned out to be the most significant step of all, as they discovered that they had not understood this data nearly as well as they thought. The file was rife with undocumented transaction types and special situations that were completely unexpected.

Before they could populate the data mart they realized that they had to start by gaining a better understanding of the data. They created a small project team that was tasked with discovering the most important business rules. They started by creating virtual columns and data models that mirrored what they expected in the file. They then iteratively addressed the largest differences, refining their understanding of the actual business processes and reducing the differences with their production reports. In the end they were able to reconcile the major systems to within 1%. This was by no means ideal, but was an order of magnitude better than their previous best efforts.

Results

Armed with a more complete understanding of their data, they were able to quickly create the appropriate transformations to match the model of the data mart. The entire process took three months, of which virtually all of the time was spent in the discovery phase of the project.

When complete, they declared that this was their first successful data warehousing project.