Perusing through various solutions available for Data quality improvement and management, it strikes me that most approach the problem with a technical point of view. The technical approach could either be an array of tools and applications deployed to tackle data quality problems, or it could be rules and methods employed to solve data quality woes.
I just noticed that I used the word problem twice in preceding couple of lines. This must mean there is a challenge/issue/problem I am trying to either find a solution for or at least lay groundwork for discussions and ideas around user community’s experiences.
So, before much ado let me define the problem here – the problem is data quality or lack thereof. There are several ways to define data quality; however, we see it as a trust. “Data Quality is represented by trust that users have in their enterprise’s data and this trust is ensured by defining processes and accountability around enterprises’ information assets.”
In our experience, a technical only approach does not work for the simple reason that it fails to take into account the underlying causes of bad data. It does not answer “How” and “Why”. A technical approach focuses on tools and databases but does not involve the business and the end users who are most impacted by bad data.
Think of it this way, most IT systems contribute to bad data. In fact as information systems get more advanced, data quality gets worse. Why should a technical solution bought specifically to cleanse data be any different, unless it is used in a holistic fashion? What is needed is a comprehensive approach that includes all phases of data lifecycle.
One way to do this is to finds points of interaction that data has with processes, users and organizations. This is by no means the only way, but it gets us started. You begin from the very infancy when data has just been procured or created and you progress thru various stages in its lifecycle, such as dissemination, maintenance, application and then eventually discarding. At each stage you examine how data interacts with its environment, users, processes and organizations.
This exercise leads to a better understanding of data, where it is used, who uses it, and who/what would be impacted if data were to be of inferior quality. This lays a foundation for next steps in data quality improvement or assessment. Don’t get me wrong technical has its role. But it comes much later in the process.
During next few days, I will talk a lot more data quality and other aspects related to it.