Top 5 data quality & accuracy challenges and how to overcome them

We are excited to deliver Transform 2022 again in-person July 19 and nearly July 20 – 28. Join AI and data leaders for insightful talks and thrilling networking alternatives. Register right now!

Every firm right now is data-driven or a minimum of claims to be. Business selections are now not made primarily based on hunches or anecdotal developments as they have been up to now. Concrete data and analytics now energy companies’ most important selections.

As extra corporations leverage the ability of machine studying and synthetic intelligence to make vital decisions, there have to be a dialog across the quality—the completeness, consistency, validity, timeliness and uniqueness—of the data utilized by these instruments. The insights corporations count on to be delivered by machine studying (ML) or AI-based applied sciences are solely nearly as good because the data used to energy them. The outdated adage “rubbish in, rubbish out,” comes to thoughts when it comes to data-based selections.

Statistically, poor data quality leads to elevated complexity of data ecosystems and poor decision-making over the long run. In truth, roughly $12.9 million is misplaced yearly due to poor data quality. As data volumes proceed to improve, so will the challenges that companies face with validating and their data. To overcome points associated to data quality and accuracy, it’s vital to first know the context during which the data components will probably be used, in addition to finest practices to information the initiatives alongside. 

1. Data quality shouldn’t be a one-size-fits-all endeavor

Data initiatives aren’t particular to a single enterprise driver. In different phrases, figuring out data quality will at all times depend upon what a enterprise is attempting to obtain with that data. The identical data can affect a couple of enterprise unit, operate or undertaking in very other ways. Furthermore, the record of data components that require strict governance might range in accordance to completely different data customers. For instance, advertising and marketing groups are going to want a extremely correct and validated e mail record whereas R&D could be invested in quality consumer suggestions data.

The finest group to discern a data aspect’s quality, then, could be the one closest to the data. Only they are going to be ready to acknowledge data because it helps enterprise processes and finally assess accuracy primarily based on what the data is used for and how.

2. What you don’t know can harm you

Data is an enterprise asset. However, actions communicate louder than phrases. Not everybody inside an enterprise is doing all they’ll to ensure that data is correct. If customers don’t acknowledge the significance of data quality and governance—or just don’t prioritize them as they need to—they aren’t going to make an effort to each anticipate data points from mediocre data entry or elevate their hand after they discover a data challenge that wants to be remediated.

This is likely to be addressed virtually by monitoring data quality metrics as a efficiency purpose to foster extra accountability for these instantly concerned with data. In addition, enterprise leaders should champion the significance of their data quality program. They ought to align with key group members in regards to the sensible affect of poor data quality. For occasion, deceptive insights which are shared in inaccurate reviews for stakeholders, which may probably lead to fines or penalties. Investing in higher data literacy can assist organizations create a tradition of data quality to keep away from making careless or ill-informed errors that harm the underside line.

3. Don’t attempt to boil the ocean

It shouldn’t be sensible to repair a big laundry record of data quality issues. It’s not an environment friendly use of assets both. The variety of data components lively inside any given group is big and is rising exponentially. It’s finest to begin by defining a company’s Critical Data Elements (CDEs), that are the data components integral to the primary operate of a selected enterprise. CDEs are distinctive to every enterprise. Net Revenue is a standard CDE for many companies because it’s essential for reporting to buyers and different shareholders, and so forth.

Since each firm has completely different enterprise targets, working fashions and organizational constructions, each firm’s CDEs will probably be completely different. In retail, for instance, CDEs may relate to design or gross sales. On the opposite hand, healthcare corporations will probably be extra all in favour of making certain the quality of regulatory compliance data. Although this isn’t an exhaustive record, enterprise leaders may contemplate asking the next questions to assist outline their distinctive CDEs: What are your vital enterprise processes? What data is used inside these processes? Are these data components concerned in regulatory reporting? Will these reviews be audited? Will these data components information initiatives in different departments throughout the group? 

(*5*) and remediating solely probably the most key components will assist organizations scale their data quality efforts in a sustainable and resourceful method. Eventually, a company’s data quality program will attain a stage of maturity the place there are frameworks (usually with some stage of automation) that can categorize data belongings primarily based on predefined components to take away disparity throughout the enterprise.

4. More visibility = extra accountability = higher data quality

Businesses drive worth by realizing the place their CDEs are, who’s accessing them and how they’re getting used. In essence, there is no such thing as a method for an organization to determine their CDEs in the event that they don’t have correct data governance in place in the beginning. However, many corporations battle with unclear or non-existent possession into their data shops. Defining possession earlier than onboarding extra data shops or sources promotes dedication to quality and usefulness. It’s additionally clever for organizations to arrange a data governance program the place data possession is clearly outlined and individuals may be held accountable. This may be so simple as a shared spreadsheet dictating possession of the set of data components or may be managed by a complicated data governance platform, for instance.

Just as organizations ought to mannequin their enterprise processes to enhance accountability, they need to additionally mannequin their data, by way of data construction, data pipelines and how data is reworked. Data structure makes an attempt to mannequin the construction of a company’s logical and bodily data belongings and data administration assets. Creating any such visibility will get on the coronary heart of the data quality challenge, that’s, with out visibility into the *lifecycle* of data—when it’s created, how it’s used/reworked and how it’s outputted—it’s not possible to guarantee true data quality.

5. Data overload

Even when data and analytics groups have established frameworks to categorize and prioritize CDEs, they’re nonetheless left with 1000’s of data components that want to both be validated or remediated. Each of those data components can require a number of enterprise guidelines which are particular to the context during which it is going to be used. However, these guidelines can solely be assigned by the enterprise customers working with these distinctive data units. Therefore, data quality groups will want to work carefully with material specialists to determine guidelines for every and each distinctive data aspect, which may be extraordinarily dense, even when they’re prioritized. This usually leads to burnout and overload inside data quality groups as a result of they’re answerable for manually writing a big sum of guidelines for a wide range of data components. When it comes to the workload of their data quality group members, organizations should set real looking expectations. They might contemplate increasing their data quality group and/or investing in instruments that leverage ML to scale back the quantity of guide work in data quality duties.

Data isn’t simply the brand new oil of the world: it’s the brand new water of the world. Organizations can have probably the most intricate infrastructure, but when the water (or data) working via these pipelines isn’t drinkable, it’s ineffective. People that want this water should have quick access to it, they need to know that it’s usable and not tainted, they need to know when provide is low and, lastly, the suppliers/gatekeepers should know who’s accessing it. Just as entry to clear ingesting water helps communities in a wide range of methods, improved entry to data, mature data quality frameworks and deeper data quality tradition can shield data-reliant applications & insights, serving to spur innovation and effectivity inside organizations world wide.

JP Romero is Technical Manager at Kalypso
DataDecisionMakers

Welcome to the VentureBeat neighborhood!

DataDecisionMakers is the place specialists, together with the technical individuals doing data work, can share data-related insights and innovation.

If you need to examine cutting-edge concepts and up-to-date data, finest practices, and the way forward for data and data tech, be a part of us at DataDecisionMakers.

You may even contemplate contributing an article of your individual!

Read More From DataDecisionMakers

https://venturebeat.com/2022/04/24/top-5-data-quality-accuracy-challenges-and-how-to-overcome-them/

Recommended For You