SAP Lumira and Big Data ASUG Webcast Recap – Part 1
ASUG News Craig Powers wrote about it here: SAP Looking to Go Big on Self-Service Analytics with Hadoop and Lumira – ASUG News
Below are my notes (Part 1):
SAP’s Paul Ekeland provided this ASUG webcast last week. Please note the usual legal disclaimer applies that things in the future are subject to change.
Figure 1: Source: SAP
Big Data is popular; this is explained by 2 factors – cost effective way to store information
Hadoop allows you to store data on commodity software
It is not just about cost; also do not have to think ahead of how shape information in Hadoop system
Think about how you are going to use it
Figure 2: Source: SAP
Hadoop is spacious but slow like a bus; HANA is like “racing cars” in terms of speed
Figure 3: Source: SAP
Figure 3 covers “Hub and spoke”, storing data in a data lake
Put data marts or enterprise data warehouse on top of it so it would extract it and stage where you plug BI apps
If you access directly in data lake can be slow
Data exploration possibilities include exploring and try to figure out high level information
Hub and spoke architecture is becoming the standard
Figure 4: Source: SAP
Hive provides SQL access to data in HDFS
Oozie workflow allows you to schedule jobs in Hadoop
Hadoop system is open source
Figure 5: Source: SAP
Figure 5 covers how HANA & Hadoop work together
Hadoop – investigate; once know you what you want to extract, push data to HANA; it will operationalize information like “no other tool”
Figure 6: Source: SAP
Companies have “mountains” of information
Linked in shows 18% of jobs are related to data
There is a talent gap in the market
Figure 7: Source: SAP
Lumira addresses visuals
As soon as you want to understand which part will break, need predictive. Two tools “play nicely together”
Figure 8: Source: SAP
Both BI and Lumira share the same datasources
Future includes SparkSQL, MongoDB, Graphs
Data access extensions are available for Lumira
SAP has partnerships with vendors such as Cloudera, Hortonworks, MapR to ensure they work
Figure 9: Source: SAP
Machine sensors send event every second
Hadoop is slow; its level of SQL is limited
Figure 10: Source: SAP
Data prep is in Lumira; then schedule jobs via Ooozie to generate full dataset
Load data to Lumira, share via various flavors of Lumira
In HANA connect via Smart Data Access
You can use Impala or HIVE driver
Figure 11: Source: SAP
Generate table automatically so visualizations created will re-point to virtual table
Figure 12: Source: SAP
Figure 12 shows the planned self service on Hadoop, starting with sampling the data, scheduling to generate the dataset, Hadoop to access visualizations, publishing to Lumira Server
Figure 13: Source: SAP
Figure 13 shows planned deliverables
Part 2 is coming when I have time.
What do you think of SAP Lumira and Big Data?