How can I find and clean up duplicate client records in my 'CATS database, and prevent future duplicates from occurring?Duplicate records negatively affect the integrity of your database. For example, for DLA-funded centers, duplicates can cause you to overstate your total number of initial counseling sessions on the DLA 1806 report. For SBA-funded centers reporting to EDMIS, duplicate client records are particularly troublesome because not only will you be overstating your client count and understating your extended engagement client count, but cleaning up duplicates is impossible for data that was submitted to EDMIS in prior fiscal years. There are other complications of duplicate records in addition to reporting complications. Duplicate records lead to confusion for your users (which client record should I be working with?). They also make it difficult to piece together a complete history of a client’s activity because information is spread across multiple client records instead of being available in one location. How do duplicate records get introduced into your database in the first place? Softshare has discovered that, most of the time, a duplicate client (or contact) record is introduced because the counselor didn't realize the client already existed in the database. Why didn’t the counselor know? Because they didn’t have view permissions to client records belonging to other centers in the network. Restrictive permissions are the leading cause of duplicates. It's typical for a counselor to only be assigned permissions that let them view, edit and delete clients within their own center. But when the same client visits two centers that are part of the same network, these limited permissions force the counselor to enter the client into the database a second time, thus creating a duplicate client record. And since clients often visit more than one center in their pursuit of answers, multi-center networks are at risk of developing duplicate client issues in their Softshare 'CATS databases. The process for cleaning up duplicates is involved and there are many considerations to be aware of. This FAQ covers common considerations, but each database may have its own unique set of circumstances. The following four sections thoroughly address the issue of duplicates:
Locating Duplicate Records Using Duplicate Analysis ReportsTo keep your database free of duplicate records, it's a good idea to regularly check your database for duplicates. Softshare has developed several duplicate analysis reports in Softshare WebCATS to help networks determine whether duplicates exist in their databases and what the scope of the issue might be. WebCATS has three built-in reports that exist solely to identify duplicate client records:
WebCATS also has two built-in reports dedicated to duplicate contact records:
Note: All five of these duplicate reports are case insensitive, but not punctuation insensitive. For example, "Softshare, Inc." and "SoftShare, Inc." would be treated as identical company names, but "Softshare Inc." and "Softshare, Inc." would not. Identifying Legacy Duplicate Client RecordsBefore you begin cleaning up duplicate client records, it is imperative that you first identify which records cannot be altered because of the need to maintain audit integrity. For example, SBDCs, WBCs and PTACs are required to report their numbers to either the SBA or the DLA. Once these numbers have been reported, you are accountable for them and it is important that any clean up process does not affect your reported numbers. In this article, we will refer to records that cannot be altered because of the need to maintain audit integrity as legacy records. Records that have not yet been reported are referred to as non-legacy records. Softshare has different recommended procedures for cleaning up duplicate client records, depending on whether the client records are legacy or non-legacy records. The determining factor as to whether a duplicate client record is legacy or non-legacy is the dates of its associated counseling, investment, or award sessions. If a client has any session activity that has been reported to your funding partner, it is a legacy record. Following are some parameter guidelines based on funding partner. SBA-Funded Centers (SBDC/WBC)The EDMIS system was established on October 1, 2005. Client activity is uploaded from 'CATS to EDMIS on a quarterly basis. Thus, any client that had a session between October 1, 2005 and the last quarterly upload is what we call a “legacy record” because it has been reported to EDMIS and should not be deleted. Nor should any of its activity sessions be consolidated to another client record.Clients that have not had any session activity uploaded to EDMIS are what we call “non-legacy records” and these records can be safely consolidated with a keeper record, and then deleted. Note: The duplicate client reports include the first EDMIS session date and the most recent session date to help you quickly determine the legacy status of potential EDMIS data. Note: In some cases, SBA may allow you to clean up current-year legacy duplicates and resubmit to EDMIS. See SBA EDMIS Fiscal Year Rollback Exception for a discussion of possible clean-up provisions of current-year legacy duplicates in the EDMIS system. DLA-Funded Centers (PTAC)The DLA parameters are more straightforward. The key date here is when the last DLA 1806 was submitted. If a client has been reported on any 1806, then it is a legacy client. Clients with initial activity entered after the last 1806 submission, and thus never submitted to DLA, are non-legacy clients. Duplicate Client & Contact Record Clean-up ProcessAfter identifying how many legacy and non-legacy duplicate client records exist in your database, you'll need to decide if you want to manually clean them up or have Softshare do it on your behalf using automated scripts we have developed. As you read on in this section, you'll see that it is quite a time-consuming process to manually remove the duplicates—especially legacy duplicates. For this reason, we typically recommend that if you have a significant number of duplicate records, you should strongly consider contracting with Softshare to do the job for you. If you decide to clean up duplicates manually, remember that there are different procedures for legacy and non-legacy records, as discussed in the following two sections:
Warning: Non-database administrators should not attempt to consolidate and delete duplicate records without first consulting their database administrator. Improper handling of duplicates can negatively impact the audit integrity of the 'CATS database. Note: This article’s primary focus is the cleaning up of duplicate clients and their associated contacts. However, there may also be duplicate contact records in the database that are not associated with client records. WebCATS built-in duplicate contact reports, mentioned previously, will help identify these. Once duplicate clients and their associated contacts have been cleaned up, a similar process can be followed to clean up duplicate contacts. Cleaning Up Non-Legacy Duplicate Client RecordsBecause non-legacy records do not require special audit integrity considerations, their clean-up process is relatively straightforward. Following is the recommended process for manually handling non-legacy duplicate client records.
Determining the keeper record might be based on some pre-defined criteria such as which record was most recently added to the database, which had the most recent activity or which contains the highest number of completed data fields. Determining which duplicate has the most accurate information in it is not feasible programmatically and may require a phone call to the client itself. Unfortunately, this is a highly subjective and labor-intensive process. Note: Using an asterisk in the keeper record's name, as suggested in step 1, will help you easily identify the keeper client record. Note: Re-associating client activity to the keeper client record will not affect center or counselor ownership of the activity. Center and counselor reports should not be affected by consolidating client activity to a single client record. The end result of this non-legacy clean-up process should be a single client record with all session records, contacts, and contact activity consolidated to it. Any duplicates of the client or contact records should have successfully been removed from the database. Cleaning Up Legacy Duplicate Client RecordsWhen only one legacy client record exists for a set of duplicates, it must be designated as the keeper record and the non-legacy process detailed in the previous section can be followed for all non-keeper duplicate records. In other words, if Company X exists in the database three times, but only one of the three client records has previously-reported counseling sessions associated with it, then this is the client record that must be designated as the keeper record, even if the other two records potentially have more complete information. If you have a set of duplicate client records where more than one of the duplicates has history with either SBA or DLA, then you will want to follow the next set of clean-up steps. Note: Keep in mind that if you are funded by SBA and you have a significant number of duplicates whose activity was first reported to EDMIS during the current fiscal year (Softshare refers to these as FY-legacy duplicates), you may want to consider asking SBA to allow you to roll back data submitted thus far for the current year before performing cleanup on those FY-legacy records. For more information on this option, see SBA EDMIS Fiscal Year Rollback Exception. To clean up a set of duplicates where more than one of the duplicates is a legacy client record:
The end result of this process is that the activity sessions and contacts associated with all non-legacy client records in the set of duplicates should be consolidated to the legacy keeper record, and the non-legacy client records should be deleted. All "RESTRICTED DUPLICATE" legacy duplicate client records and their associated contact records should be retained in the locked-down "DUPLICATES Center" to prevent future use. All activity sessions associated with "RESTRICTED DUPLICATE" legacy duplicate client records should remain where they occurred for audit integrity. SBA EDMIS Fiscal Year Rollback ExceptionData that has been uploaded to EDMIS in the previous fiscal year is locked by the SBA and any duplicate legacy records that were uploaded from WebCATS prior to the current fiscal year should be handled as described in the previous section. However, there may be client duplicates in 'CATS whose initial upload to EDMIS occurred during the current fiscal year. In these cases, there may be an opportunity to correct the 'CATS database for the current year by consolidating these duplicates using the non-legacy clean-up process and then updating EDMIS with the corrected data. We will refer to these potential clean-up clients as FY-legacy duplicates. It is important to recognize that when duplicates are uploaded to EDMIS, the client numbers are being inflated while at the same time the extended engagement and long-term client numbers are being under counted. For this reason, the SBA has tentatively agreed to make allowances for programs to take corrective action and consolidate these current fiscal year duplicates. On a case by case basis, they have agreed to consider wiping out the current year’s EDMIS data to allow a more accurate EDMIS export for the entire year to be resubmitted. To do this current year database rollback, your network will need to request approval from Bruce Purdy at SBA. To assess whether or not to grant this approval, the SBA will need to know the extent of the duplicate problem within your database. In other words, do you have two current fiscal year duplicate client records or 200? The duplicate analysis reports referenced at the beginning of this FAQ will help you determine the extent of the problem. You can export these reports to an Excel spreadsheet and then sort on the “First EDMIS Session” column for an easy determination of the number of FY-legacy clients that might be cleaned up pending SBA approval. Avoiding Duplicate Records in the FutureAs stated at the beginning of this FAQ, duplicate client record creation can be attributed to user permissions for the most part. However, it may also be due to lack of understanding by the users. If a user doesn’t understand the implications of creating a duplicate client then no matter what tools you make available to them, they will still create their “own” copy of the same company. Therefore, in addition to some of the recommended settings we have listed below, we also recommend that you make sure that all users realize the need to check for duplicates across all centers every time a new client or contact is created in the database. Here are the permissions settings as recommended by Softshare to ensure that the user has the ability to query for duplicate records before adding new companies to the database, and to ensure that these queries are taking all centers into account:
Softshare is aware that many SBA-funded centers have been concerned about sharing data across centers due to statute language regarding unlawful client information disclosure. To clear up any confusion on this, we had Bruce Purdy, EDMIS Program Manager, weigh in on the topic. According to Bruce, SBA strongly recommends sharing client information across a program network: "[...] by not using the same client record within the network, not only will the EEC count be incorrect, but so will things like job creation, capital infusion, etc. Cross center counseling is common especially when one center may have the expertise that another center does not. The only way these clients can obtain that counseling is to go to multiple centers/counselors. The language in the statute and the Notice of Award are basically to protect the client from unlawful disclosure. Sharing that information within the SBDC network or consortium would not in my opinion be a break from this disclosure." Besides changing user permissions, what else can you do to avoid duplicates in the future? We suggest the following corrective actions:
Note: If a "DUPLICATES Center" exists in your 'CATS database, it should be excluded when running the duplicate analysis reports in order to ensure that "RESTRICTED DUPLICATE" legacy duplicates are filtered out of the report's output. |
|
Want more? Browse our extensive list of Softshare 'CATS FAQs. |