Skip to main content

Data is ubiquitous in today's world and its global volume continues to grow every year and has reached 50 zettabytes in 2020, which is equivalent to 6 million 2-hour high definition movies. This impressive figure shows the importance of data today, especially for companies that want to become data-driven and incorporate data into their decision making. To keep up with this increase in volume, data quality is an essential prerequisite for the efficient use of company data, and more broadly for any data science project. 

 

At a time when companies are increasingly turning to a data-driven strategy, Heka.ai proposes a series of two articles: the first article will seek to quantify the data quality problem in a pragmatic way on the basis of our partners' data studied with SmartDataQuality.ai. The second article will advocate an approach to address the challenge of data quality faced by companies.

Quantifying the issue of data quality with our SmartDataQuality.ai solution

 

The projects carried out with our clients on their data quality have led us to imagine a generic solution to address this challenge common to all sectors. This is the genesis of our solution: SmartDataQuality.ai. Our tool allows us to :  

  • Detect duplicates within your databases, mainly false duplicates. As an illustration: customer databases are made up of a number of duplicates that are difficult to spot, such as "Mathieu Martin" and "Martin Mathieu". By leveraging artificial intelligence methods, our solution is able to identify these duplicates with an associated confidence index. 
  • Enrich databases with external information by intelligently reconciling different repositories. Smart Data Quality is able to enrich customer databases with socio-demographic information (such as median income, age distribution, etc.), information on the address' cadastral parcel (such as the surface area of the land, the living area, etc.). By cross-referencing several sources of information, inconsistencies become more apparent. More generally, database enrichment is a real lever to enable companies to optimize their processes: communications with customers can be more relevant, for example. 
  • Carry out diagnostics on data quality in a few clicks. For example, SmartDataQuality.ai allows to diagnose the quality of addresses present in a database or the veracity of numerical values.
  • Carry out data normalization processes: SmartDataQuality.ai identifies similarities and proposes column normalizations based solely on the analysis of the data file. 

 

Moreover, all these algorithms can be saved in order to analyze their performance, but also to apply them later on to other databases.

Image 1

Some key figures regarding data quality:

In order to quantify the problem of data quality globally, we conducted a study in collaboration with several partners to whom we offered free diagnostics of their databases. This study resulted in the following figures:

These indicators may seem abstract, but have a strong operational reality: 

  • Nearly 10% of outbound communication costs can be reduced by dealing with duplicates in customer databases, 
  • Nearly a quarter of operational activities, such as a telephone conversation with a customer or a maintenance intervention, can be improved by filling in missing data, 
  • Good data quality can prevent up to 5% of customer invoices being sent to the wrong address or even containing incorrect information.

 

In general, this poor data quality is due to human errors: typographical errors, omissions in data entry. It can also be due to poor communication between the different entities of the company, or even to perfectible business processes. These errors lead to significant costs for companies. Studies have shown that 1 to 5% of incorrect data leads to an increase in operational costs up to 8 to 12%. Beyond the costs, poor data quality hinders the company's desire to become data-driven, because it leads to 

  • Risky or even inappropriate decisions
  • Inefficiency of data-driven activities and processes
  • Very little confidence in the results obtained
  • Missed opportunities
  • Lost revenue

 

In a forthcoming publication, we will discuss how to address this issue in a comprehensive way within the company. If you are also interested in a free diagnosis of one of your databases, please contact us at smartdataquality@sia-partners.com.

 

Share

Our publications

Ai Abstract Art

Generative.AI

Data Science & AI expertise combined with consulting services enable customers to embrace all aspects of Generative AI.

1 Our Generative AI Approach
2 Generative AI at a glance
3 Use cases for Generative AI
4 How should companies prepare for Generative AI adoption?

2025

Read more
Ai Abstract Art

SiaGPT : Harness the Power of Generative AI to…

An on-demand SaaS product designed to expedite consulting workflows. By harnessing the power of Generative AI, the tool offers a cutting-edge information extractor, and intuitive prompt interface.

Original Atricle: https://www.sia-partners.com/en/trending-insights/siagpt

2025

Read more
Ai Abstract Art

Decentralized Physical Infrastructure Network: a…

DePINs, short for Decentralized Physical Infrastructure Networks, refer to physical infrastructure networks managed on a decentralized basis.
Unlike traditional systems based on centralized management by large groups, DePINs involve operation by individuals or small groups.

2025

Read more