Analyzing Big Data with Hortonworks Hadoop and SAP Lumira v1.13
Over the past 17 years I have reviewed and implemented a variety of business intelligence, reporting and predictive analytics solutions including SAP’s offerings. I have been following SAP Lumira since the debut as SAP Visual Intelligence at SAP TechEd in 2012. I remember sitting in various SAP TechEd sessions hearing SAP Product Managers tell the audience to expect three to four week feature release cycles. At that time, I was both shocked and skeptical. I thought that what I was hearing was almost impossible from a global business intelligence vendor the size of SAP. I was wrong.
This past year I bought SAP Lumira Professional at the Sapphire conference. At that time I had blogged about other data visualization solutions with SAP Hana. I felt SAP Lumira needed to mature a bit more before I would leverage it in my projects. Since then I have already received a few updates. No sooner do I install v1.12 and a few weeks later v1.13 is released with 1.14 right behind it. These SAP rapid releases are truly amazing. I am also impressed with the many improvements that I am seeing. I know there is still some catch up work to do but in one year alone there has been a remarkable, huge leap forward in the Data Discovery space for SAP customers. Here are a few of my review notes from a BI Professional perspective. Keep in mind that I do review all the Data Discovery offerings and try to be vendor neutral in my assessments.
Upon launching the latest and greatest SAP Lumira v1.13, I noticed the Welcome screen had been enhanced with links to sample content, videos and other resources. I was pleased with the nicer user interface and delightful new branding. The development path steps were visually displayed and linked to the related screens for a quick and easy jump start.
For my review, I wanted to test a Big Data set to see how SAP Lumira would perform. I used the Hortonworks Hadoop Sandbox NYSE Stock data set and a Hive connection to load the data. To connect and query Big Data with Hive, you need to install the drivers. Other options for connecting and querying Big Data are described on the SAP + Hortonworks partnership page. This page contains excellent reference documents and a Modern Data Architecture diagram that showcases how Hadoop integrates with, complements and extends your existing data platform.
Once I had the Big Data loaded, I used the Prepare features to add a Time hierarchy for time based intelligence and analysis. To create a Time hierarchy, I selected the gear icon on the CalendarDate attribute and the selected the Time hierarchy option in the menu. Instantly SAP Lumira generated a drillable hierarchy including Year, Quarter, Month and Date. The Prepare screen also had quite a few features for data cleansing including data type conversions, geospatial data types, filters, sorts, appends, merges, show or hide a field, calculated fields with a library of data formatting and logic functions.
Now that my Big Data was cleansed and prepared, I was ready to get to the fun part…visually exploring the data to look for patterns and trends. To begin visualizing a data set in SAP Lumira, you use the Visual features. By simply dragging fields onto the user interface and choosing a visualization, I could immediate see some patterns immediately. There is a plethora of visualization types available including but not limited to Column (Bar), Line, Pie, Area, Stacked, Dual Axis, Combination, Donut, Scatter, Bubble, Tree Maps, Heat Maps, Geospatial Maps, Radar, Box Plots, Word Clouds, Waterfall, Parallel Coordinate, Funnel Charts and Grids. I was able to quickly view up to 10,000 data points on the screen. This increase in data points is a great improvement over the prior 1,000 limit from earlier this year. Using the brushing features, I was able to select a subset of data to narrow my focus on and dig deeper into the details.
A wonderful feature that I stumbled upon in the review was Predictive Calculations. By choosing the down arrow on a measure I was able to choose a Forecast or Linear Regression Predictive Calculation type to add to my visualization along with specifying how many periods forward I wanted to predict.
Each time I created a visualization, I could optionally save it to my document collection for usage in a SAP Lumira story board. Available visualizations for story boards are displayed as thumbnails at the bottom of the SAP Lumira user interface. To create an SAP Lumira story board, a relatively new feature as of v 1.12, you use the Compose features. This functional area of SAP Lumira allows you to select a layout, drag views onto the story board layout sections, and add filters, text boxes and images. There is an option to immediate preview your work and easily switch between authoring and viewing during the development process.
When you are happy with your story board, you can create a new one to add to the story or you can share your analytic creation. In SAP Lumira v1.13 there are a few different way to share story boards. You can export the file for another SAP Lumira to import and explore. You can also publish the dataset to SAP HANA, Explorer, Streamworks or SAP Lumira Cloud. You can also email the visualization as a Portable Network Graphics (.png) image. In my review, I chose to publish to the SAP Lumira Cloud.
Since I had a larger data set, it did take a few minutes to complete the transfer of both the SAP Lumira views and dataset created in the SAP Lumira desktop to the SAP Lumira Cloud. When it did finish uploading, I saw both my views file and dataset in the My Items list. Immediately I wanted to see how my masterpiece looked in a web browser but could not find any way to see it up there. I later learned that the SAP Lumira views built in the desktop version do not render in SAP Lumira Cloud right now. I do hope that limitation changes in future releases. So what does render in the in SAP Lumira Cloud? Exploring a bit more I discovered that a different authoring and exploration view of my uploaded Hadoop data set was available. This SAP Lumira Cloud authoring environment did not have as many bells and whistles as the SAP Lumira desktop version but it did render the views extremely fast.
All in all, I found SAP Lumira v1.13 to be easy to use and a significant advancement over previous versions. The rapid release cycles are simply amazing. Hadoop Big Data sets rendered quickly and completely without errors. The richer data preparation, time intelligence, wide array of visualization types, predictive features, story boards and sharing features are all great strides forward for SAP in the highly competitive Data Discovery market.
Very good and thank you for writing this - it makes me want to also try it with a Big Data scenario.
I've been looking forward to reading your post Jen. Not only does it show some of the capabilities of SAP Lumira with big data, it also does a very good job at briefly describing the thought process behind creating these visualizations. Excellent read, thanks Jen.
A very detailed entry about your experience.
I was glad to see you using Hadoop, and also pointing out the cleansing requirement for data.
With your extensive background in BI and other solutions, apart from the every-progressing updates that are being presented on frequent updates, what is your second most enjoyed feature to date?
Also, with that big data set that you used, how long did it actually take from start to finish?
Thanks for the feedback. Favorite feature... 1) easy time intelligence that a business user can figure out. I did not try a customer calendar, 4-4-5 but liked the time functions in the calcs library. 2) Predictive is a close second. 3) The nice array of various visualization types with combination and dual axis charting capability is third but I'd bet for business users that one ranks higher.
Note I felt Storyboard was a need to do to be considered a player...that is the only reason it is not my #1. I do have a long list of ideas for future improvement. I did send in a few earlier this summer before v1.12. The top need right now would be viewing the same viz in the SAP Lumira Cloud using a web browser.
Hope that helps.
Can you elaborate on the steps you took to get the data from Hadoop (apart from installing the driver)?
Feel free to send me a note.