SAP Agile Data Preparation
I recently wrote a blog on the new EIM (Enterprise Information Management) capabilities that were delivered in SAP HANA SP9
http://scn.sap.com/community/enterprise-information-management/blog/2015/04/10/sap-hana-eim Well, as ever, things are moving pretty fast in the HANA world and SAP has developed even more capability based on top of these new EIM solutions.
The SAP Agile Data Preparation (ADP) solution has been built for all types of users, Business, IT & Data Stewards and provides self service access to data that allows them discover, prepare and share data without involvement from a more traditional IT developer. You can quickly connect to and upload multiple data sets including relational (Oracle, MS SQL Server, IBM DB2) or files, on premise and in the cloud. The software will then help you discover and understand the data, cleanse, enrich and combine your data.
The diagram below shows the user friendly interface, spreadsheet like, that allows you to quickly and intuitively manipulate data. When you select a column that you want to manipulate the menu on the right hand side changes and displays a list of basic functions that you can use. These include Change Case, Trim, Replace and many more. If more advanced HANA functions are required then you can add an additional column using the advanced formula editor. Even the advanced editor is designed to be easy to use. When you select a particular function help text is displayed in the same window so there is no need to go and look elsewhere for help.
Once the data has been uploaded automatic profiling of the data can be performed using the Assess Quality option. This will perform data profiling, minimum / maximum values, NULL’s etc as well as patterns of data and will also identify the content types within the data. This is particularly useful as not all sources of data have understandable column names and it will help identify if, for example, the fields contains person or address type information such as names or postcodes.
The content types area also used as part of the cleanse process that Agile Data Preparation is capable of. When the Cleanse Worksheet option is selected it will automatically choose the columns to cleanse based on the content types. This can be changed and fields can either be added or removed manually.
Based on the options you choose it will show you the statistics of what’s been cleaned and add additional columns to the data set that show the newly cleaned data.
You can easily compare the data before and after it has been cleaned.
To further ensure that the data is of the highest quality for migration, master data or analysis it can also be used to identify and remove duplicate data. Again based on the data, it will, by selecting the remove duplicate records option, suggest matching policies or you can create your own. It also allows you to create survivorship rules which determine which record to keep and which to discard. For example a survivorship rule could be based on the length of a string. This could be used on a column that contains names and you want the longest string because this ends to be more formal as shortened names usually contain nicknames.
A summary statistics screen is displayed with which you can view the duplicate records.
An action history records all on actions within the project for undo, audit, refresh and replay purposes. At any point you can simply select the red cross to undo a particular action.
The next stage within this process might be to combine data together. You can merge or join different data sets together and ADP guides you through this process. Advanced database join operations can be performed but the user is guided through this process through intuitive screens with illustrations. This will also show you an example data set to test the join. Once combined a new worksheet is created within the project.
Once your data has been prepared you then have the option to share the data. You can share the data with other members of the project team through the ADP UI5 interface, download it to Excel or CSV files or publish the data to a HANA table for persistence.
The solution allows you to operationalise this process by downloading it to a HDBFlowgraph. See blog above for more information on HDBFlowgraph’s. I particularly like this feature as more traditional approaches to data preparation introduce a delay as the business and IT communicate / document the requirements. This closes the gap somewhat as the business are defining and creating this but IT have the responsibility to productionise it.
As ever there will always be involvement from IT with activities such as monitoring. So, of course, ADP has the ability to analyse data usage through the Data usage statistics option. You can monitor usage, who is doing what, by project or table. It also has user management built in and full error logging.
In summary ADP provides intuitive yet powerful data manipulation capabilities, without the reliance on it, that enables the business to improve the usability of data and its value. It is a modern SAP UI 5 interface which utilises the Smart Data Integration / Smart Data Quality capabilities of SAP HANA.
You can view a demonstration here SAP Agile Data Preparation: Transform Data into Actionable Information – YouTube