In this article, you'll learn how to normalize addresses and detect hidden duplicates using functions in Arbutus Analyzer. Also included are files, scripts and instructions for you to run this yourself in Analyzer. We'll take a look at two examples: first, using a single file, and second, comparing normalized addresses across two files.
We want to detect duplicate addresses in the Vendor master file, both to clean up our data and identify potential double-billing frauds. We'll do this by normalizing the addresses so they are more comparable and then look for duplicates in the normalized addresses, rather than the addresses as recorded. It is important to normalize the addresses as fraud perpetrators may try to mask their efforts by making the addresses appear different.
This is a simple text file with two columns of words. The words on the left are those that can be found in most addresses, including abbreviations. The words on the right are the result of the transformation by SortNormalize(). For example, the following instances of "avenue" are all transformed to the text on the right. As well, this will remove the "noise" words that have no partner in the list, such as "THE" and "OF".
This is a plain text file, so users can easily update it by adding another line containing a new replacement pair.
This file can be referenced in the computed field expression without a full path only if it is located in the Arbutus project folder. If the text file is located in other locations, you must specify the full path.
In this example, we will compare our vendor addresses to addresses on a US watchlist.