Data Management – Terminologies & Definitions
As a third step in my Data Management article series – lets look at commonly used terminology in the domain. Now these are very standard definitions I am quoting from a standard available glossary. The next step – next article would be to explain the relevance and usage of these terminology in business world. E.g. How to look at data standardization in supplier data context or material data context – when it comes to optimizing your procurement processes. That’s next.
In my first article in this data management series –I compared data management with the story of elephant and seven blind men. http://manageyourdata.blogspot.in/2012/09/data-management-elephant-seven-blind-men.html
The second post is more about – why its important to speak same language when you are running any data management initiative. http://manageyourdata.blogspot.in/2012/09/data-management-are-we-all-speaking.html
Data analysis : Analysis of data is a process of inspecting, cleaning, transforming, and modeling data with the goal of highlighting useful information, suggesting conclusions, and supporting decision making.
Data Governance : The exercise of decision-making and authority for data-related matters. The organizational bodies, rules, decision rights, and accountability of people and information systems as they perform information-related processes. Data Governance determines how an organization makes decisions — how we “decide how to decide.”
Data Governance Framework: A logical structure for organizing how we think about and communicate Data Governance concepts.
Data Governance Methodology:A logical structure providing step-by-step instructions for performing Data Governance processes.
Data Governance Office (DGO): A centralized organizational entity responsible for facilitating and coordinating Data Governance and/or Stewardship efforts for an organization. It supports a decision-making group, such as a Data Stewardship Council.
Data Mapping :The process of assigning a source data element to a target data element.
Data Modeling :The discipline, process, and organizational group that conducts analysis of data objects used in a business or other context,entities the relationships among these data objects, and creates models that depict those relationships
Master Data Management (MDM): A structured approach to defining and managing an organization’s Master Data
Data Classification :The categorization of data, following various schema to support various business or technology goals.
Data Cleansing : Also referred to as data scrubbing. Data Cleansing is the process of detecting dirty data in a database (data that is incorrect, out-of-date, redundant, incomplete, or formatted incorrectly) and then removing and/or correcting the data. Data cleansing is often necessary to bring consistency to different sets of data that have been merged from separate databases. Cleansing data involves consolidating data within a database by removing inconsistent data, removing duplicates and re-indexing existing data in order to achieve the most accurate and concise database. It can involve manual tasks or processes automated by special Data Quality tools. A particular type of Data Cleansing is Address Cleansing, in which street addresses are converted to a standard format as set forth by the U.S. Postal Service master database. For example, standard abbreviations are utilized, typos are corrected and ZIP codes are converted to 9-digit format. Address cleansing is usually done in conjunction with address matching, a process that validates an address against one of the 57 million addresses in the USPS database
Data Conversion: The manipulation of information sets from one format or structure to another. Data Conversion is often required when acquiring sets of information from outside sources
Data Enrichment: An activity that supplements and/or improves the existing data
Data Mart: A repository of data gathered from operational data and other sources. The data may derive from an enterprise-wide database or data warehouse or from more specialized sources. The emphasis of a data mart is on meeting the expectations and needs of a particular group of users, so it may be designed to assist them in performing analysis and understanding the content
Data Mining: The analysis of data for relationships not previously discovered. Data Mining (DM) is also known as Knowledge Discovery. It is the process of automatically searching large volumes of data for patterns that may be used to predict future behavior
Data Profiling: The process of examining data in an existing database and collecting statistics and information about that data. The information collected may be used to collect metrics on data quality, assess whether metadata accurately describes the actual values in the source database, determine if existing data can be re-purposed, or understand risks and challenges in using the data
Data Quality: The practice of correcting, standardizing, and verifying data
Data Standardization: The transformation of data into consistent formats
Data Validation: As a broad concept, Data Validation refers to the confirmation of the reliability of data through a checking process. As a set of processes Data Validation refers to a systematic review of a data set to entity outliers or suspect values. More specifically, data validation refers to the systematic process of independently reviewing a body of analytical data against established criteria to provide assurance that the data are acceptable for their intended use. Within databases, Data Validation refers to procedures built into databases to define and check acceptable input for fields, and to accept or reject the data
For a detailed level glossary – you can visit http://www.datagovernance.com/glossary_d.html . Most of the definitions are from this glossary.
Thanks
Prashant Mendki
Twitter – @pmendki