Big Data Governance
Today, Big Data solutions mandate the need for metadata management, scalable architectures, and privacy, security, and compliance controls. This requires A New Kind of Governance. In part three of this three part series we will look at some of the core components of Big Data Governance as it relates to the traditional Enterprise Data Management dimensions of Standards, Governance, Quality, Deployment, and Architecture. The following diagram depicts the governance components and relationships we will explore, taking into account Big Data Governance – The Three “V’s” reviewed in part one, and Big Data Governance – Techniques & Technology that we covered in part two.
Standards – In Big Data scenarios, contextual metadata can provide context and meaning to the often less structured source content. Contextual metadata, the identification of Keywords and their definitions, represents the Big Data equivalent of standards. Keywords can be organized into Taxonomies & Ontologies that help align high volumes of unstructured content to an organizations traditional structured data. Keywords are proposed via Text Analysis or Semantics Technology. In Big Data Use Cases that require analysis of less structured content to an organizations structured data, keywords, taxonomies, or ontologies can purposefully be aligned to enable the analysis.
Governance – Big Data content requires stewardship throughout the information lifecycle, from consumption to retention to delivery to archival and deletion. Data steward activities include management of Big Data Use Cases, identification of Big Data Sources & Content, and increasingly important is the establishment of data Privacy, Security, & Compliance controls. Stewards should be identified and aligned to the Big Data Use Cases and scenarios deployed. For example, Big Data scenarios that use social media may correlate to Marketing, Sales, Customer Service, or Legal departments, from which stewards would be identified. Sensor or machine data might correspond to Engineering, Product Management, or Customer Service departments, from which stewards would be identified. Additionally, for specific industries, customer data might be subject to privacy and compliance controls, requiring stewards from the Regulatory, Legal, or Compliance departments.
Quality – Adding context and meaning via metadata or cleansing and standardizing Big Data source so as to not change the meaning of the raw data has to be done with care. To address data quality in Big Data scenarios Service Level Agreements need to be understood and balanced against processing that that is applied to raise data quality. Keywords can be utilized to support appropriate data cleansing, transformations, or standardizations. For example, if a Name or Address keyword is triggered on some unstructured content, the actual name or address content may be standardized for the purpose of matching to an organizations business partner or D&B data. Another example, if a Product Number keyword is detected the associated string may be normalized, for the purpose of matching to an organization’s normalized material numbers. Auditing & Management Statistics is another key component to data quality. This can serve to identify potential defect conditions from the data cleansing, transformation, or standardization routines resulting in false positives, false negatives, and failures to recognize
Deployment – The value of information changes threw out its lifecycle. Deployment of Big Data solutions needs to involve business stakeholders and stewards in Organizational Change Management (OCM) practices related to Big Data scenarios. Doing so will help to determine, when Exploratory Analytics are ready to transition to new Big Data Use Cases, technology solutions that balance Service Level Agreements with data quality, the enablement of appropriate Privacy, Security, & Compliance controls, and the management content storage based on changes to value density over time.
Architecture – Infrastructure for Big Data scenarios needs to provide storage for data based on the frequency of access and the value density. High value density content typically resides in In-Memory Analytics Appliances, e.g. SAP HANA, Medium value density content is stored in High Performance Analytic Database servers, like SyBase IQ, and large volumes of lower value density content stored in Hadoop.
To maximize the value and minimize the risks, a Big Data Governance practice is essential for Big Data scenarios. Organizations require A New Kind of Governance to maximize the value of Big Data solutions. When organizations have access to large volumes of near real time data that is trusted, it can translate into better business decision-making, higher agility towards market trends, and improved responsiveness to customer’s needs.
Are you and your organization deploying people and processes, implementing technology, and establishing controls to properly govern Big Data? Are you documenting your Big Data Use Cases and identifying the stakeholders and stewards? Have you established standards for keywords and their definitions? Have you incorporated the appropriate Privacy, Security, & Compliance controls for your Big Data solutions? SAP provides solutions and services that will accelerate time to value on your Big Data initiatives and programs. For more information on our Big Data Services, please visit us on: www.sap.com/bigdataservices. Thank you for your interest in this series on Big Data Governance.