Skip to Content
Author's profile photo Timo ELLIOTT

Dancing With Dirty Data Using SAP Visual Intelligence

bad data fixing 640.jpg

Here’s my entry for the SAP Ultimate Data Geek Challenge, a contest designed to “show off your inner geek and let the rest of world know your data skills are second to none.” There have already been lots of great submissions with people using the new SAP Visual Intelligence data discovery product.

I thought I’d focus on one of the things I find most powerful: the ability to create visualizations quickly and easily even from real-life, messy data sources. Since it’s election season in the US, I thought I’d use some polling data on whether voters believe the country is “headed in the right direction.” There is lots of different polling data on this (and other topics) available at

Below you can see the data set I grabbed: as you can see, the polling date field is particularly messy, since it has extra letters (e.g. RV for “registered voter”), includes polls that were carried out over several days, and is not consistent (the month is not always included, sometimes spaces around the middle dash, sometimes not…).

poll data sample 640.jpg

If you take this data and try to paste it into Excel, it automatically converts numbers like “6/02” into the 2nd of June, further scrambling the analysis, so instead I put it directly into a text file.

excel scramble.jpg

To see how you can easily take the messy data and turn it into shareable analysis, I recorded a short demonstration of the steps involved:

If you’d like to try the product, you can download it for a free trial at The product is undergoing very rapid iteration cycles, so please give your feedback and feature requests at the SAP Community Network Ideas Place.

Assigned Tags

      Be the first to leave a comment
      You must be Logged on to comment or reply to a post.