The customer, a large public utility, initiated a project to create
a data mart populated with data from their customer master file.
This file was their primary legacy system, and contained information
on every account and activity in the region. Access to this data
has been provided in the past by special purpose extraction routines,
but these had proven too costly and inflexible to meet the ongoing
needs of the corporation. In addition, the data and reports produced
by the extraction routines did not reconcile well with the production
reporting processes. As such, the uses to which the data could be
applied were very limited.
The customer master file had evolved over a period of decades to meet the changing
needs of the organization, resulting in a very large, very complex file (description)
that bore little resemblance to the original design.
A number of the largest ETL vendors had been invited to make proposals, but
in each case the recommended solution to the data access problem was a similarly
complex PL/1 program. Because of ongoing maintenance issues of a custom program,
the significant time span for implementation and the high costs quoted in the
proposals, they had not yet proceeded with the project and were re-investigating
alternatives, looking for a better way.
The Arbutus technology was chosen after demonstrating the ability to directly read their complex master file, providing the project team with the tools and confidence they needed to proceed. The Arbutus solution could do the job, and at a fraction of the cost of other proposals.
Working with Arbutus technology specialists, the team began the
discovery phase, defining and profiling the data. This step turned
out to be the most significant step of all, as they discovered that
they had not understood this data nearly as well as they thought.
The file was rife with undocumented transaction types and special
situations that were completely unexpected.
Before they could populate the data mart they realized that they had to start
by gaining a better understanding of the data. They created a small project
team that was tasked with discovering the most important business rules. They
started by creating virtual columns and data models that mirrored what they
expected in the file. They then iteratively addressed the largest differences,
refining their understanding of the actual business processes and reducing
the differences with their production reports. In the end they were able to
reconcile the major systems to within 1%. This was by no means ideal, but was
an order of magnitude better than their previous best efforts.
Armed with a more complete understanding of their data, they were
able to quickly create the appropriate transformations to match the
model of the data mart. The entire process took three months, of
which virtually all of the time was spent in the discovery phase
of the project.
When complete, they declared that this was their first successful data warehousing
project.