The 5 Efficient Ways to Find and Resolve Data Issues | by Hanzala Qureshi | Jun, 2023

2. Fixing data issues in existing tables

Over time the data quality deteriorates due to a lack of governance processes. Some keys were recycled, duplicate information was added, or patches were applied, which worsened things.

A simple data profile can provide the current state of data in a given table. Now — focus on the core attributes/columns that have these issues. The key is to isolate the issue as much as possible. Once attribute(s) have been determined, apply a one-time fix. For example, if data is duplicated, agree with the Data Stewards on how to get to a single record. Or if the data is inaccurate such as date of birth, start and end dates etc., then agree on the correct replacement and apply the fix.

Once the fix is applied, you must operationalise this process to avoid further deterioration of data quality. This cleansing job can run daily and fixes the data by running update statements. Or it could be manual intervention by an end user assessing an audit table.

As an example, if your customer data table has duplicate customer records, you can use a data quality tool to profile your data. This will help you identify the duplicates and determine why they occur. The duplicates could be caused by the source sending the same information multiple times, poor data pipeline code, or a business process. Once you have identified the duplicates and their root cause, you can merge the records or delete the redundant record. If you cannot resolve the root cause, you can set up a cleansing job to perform a duplicate check, match customers, merge them, and delete the redundant record regularly (master data management).

Source link

Leave a Comment