Removing Duplicates from Databases

Removal of duplicate records is a perennial challenge among database marketing professionals, either for purposes of mailing or analyses.  Keep in mind that not everyone can afford postal software to remove duplicates from address databases.

Of course, using MS-Access, one can create a solution using the “Find Duplicates Query Wizard” that comes with the software, but what if the data is NOT EXACTLY the same?  The tips below may help in the cleansing of databases, and hopefully save some dollars and frustration in the end.

  1. Using the MS-Access Query “Find Duplicates” Wizard, looking for EXACT MATCHES
    1. Always remember the simple stuff, like Social Security Numbers.  Each number is unique to all and thus can be a simple solution
    2. The Birthday/Last Name combination also may work, but care must be taken before any deletions are done.  I would probably suggest a tighter Birthday/Last Name/ZIP Code matching be done first.
    3. The First Name/Last Name/ZIP Code may bring some results when the Birthday field is not strictly filled, although this may be the least effective
  2. Creating your own query
    1. Create a field in the table and name it UNIQUER.  This will contain the database that will identify a unique field for each record.
    2. Think of what applies to your particular needs as far as what to place in the UNIQUER field.  Most of the time, for address databases, I use the following combinations
      1. First 3 letters of the first name + the first 3 letters of the last name + the first 5 digits of the ZIP code
      2. First 3 letters of the last name + first 3 digits of the ZIP code + birth month + birth day
      3. You get the picture; play with what works best with your needs.

Always ensure that all your records follow the same format rules; in other words, ZIP codes must follow the same rules for spaces, numerics are formatted in the same manner, and the dates as well.  Use an Update Query to fill the Uniquer field with the selection and combinations that have been selected.

Once the fields have been populated, it is only a matter of creating a duplicate table shell of the original database by copying the original database and pasting “Structure Only”, edit that shell to Index the Uniquer field, and then create an Append Query to that shell to eliminate the duplicates.

Prior to doing the Append Query, one may go through several steps and create temporary tables to ensure that the records are placed first in the order in which the preferred ones may be selected.  An example may be that the records are sorted by Total Sales in descending order.  When appending this information on the table shell, then at least one is assured that the records that are most deserving actually are the ones that are used.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s