Technical Articles
Big Data with SAP | SAP HANA 2.0 – An Introduction
In this blog series you will find quotes, backgrounds, suggested further readings and other information related to my latest book SAP HANA 2.0, An Introduction published by SAP Press. As the goal of the book is to provide an introduction, we could not spend as much time and pages on each and every topic as we wished at times. Big Data is one such topic although a small paragraph is included covering SAP Data Hub, SAP Vora, and SAP HANA Hadoop Integration. In this blog, I will cover big data topics in a bit more detail and include references where to find more information. Questions? Please post as comment. Useful? Give us a like and share on social media. Thanks! Updated: November 2020 |
Pure Gold
When business discovered big data it was welcomed as the new (black) gold.
- Is Big Data the New Black Gold? – Wired, 2012
- The world’s most valuable resource is no longer oil, but data – The Economist, 2017
Alas, after five years of drilling and prospecting not everyone remained as enthusiastic.
- You may have heard data is the new oil. It’s not – World Economic Forum, 2018
- No, Data Is Not the New Oil – Wired, 2019
Looking at web searches with Google Trends, we can see that the interest in big data took off in 2012 but now has waned a bit, taken over by data science and machine learning by 2018. Who’s to blame? Cloud computing.
Google Trends: Big Data versus Data Science
The Vs Everyone Must Know
According to The Origins of ‘Big Data’: An Etymological Detective Story the term goes back to the 1990s but from a technical perspective, big data took shape between 2004 and 2008 when the contemporary search giants Google and Yahoo developed and later open-sourced MapReduce and the Hadoop Distributed File System (HDFS). Pig, Hive, Zookeeper, and other Apache open source projects followed (the current count is 49).
Big data was initially characterised with 3 V’s: volume, velocity, variety, to which IBM added veracity, The Four V’s of Big Data, then we had the 5 Vs Everyone Must Know, The evolution of big data – the ‘6 Vs’, the Seven V’s of Big Data, the 10 Vs of Big Data and SAP even went up-to-eleven by adding the V of Vora (more on that V below).
Illustration from the The 42 V’s of Big Data and Data Science
Should you want to learn more, the Big data entry on Wikipedia provides as good an introduction as any (including a shady picture of the SAP Big Data bus). Alternatively, visit
The Path Forward
With the Sybase acquisition of 2010, SAP got hold of several big data-related technologies like IQ and Event Stream Processor (ESP) for IoT (Internet-of-Things) ingestion.
In 2012, SAP bundled SAP HANA with several of these technologies as the Real-time Data Platform (RTDP).
Infinite Insights
A year later, in 2013, SAP acquired KXEN, the Knowledge eXtraction Engine, which just had brought InfiniteInsight to market for self-service predictive analytics, bringing data mining to the business professional, no PhD required. SAP InfiniteInsight would morph into SAP Predictive Analytics with the Automated Predictive Library (APL) providing SAP HANA integration. Although we would now file this under Analytics, at the time data mining was the way to go to unlock big data.
- SAP Extends the Power of Predictive Analytics to Unlock Big Data With Acquisition of KXEN, SAP News (2013)
For more information about data mining and advanced analytics, see
Smart Data Services
The same year, with the release of SAP HANA SPS 06, smart data access added virtualisation to the SAP HANA platform, which enabled direct access to Hadoop and other data sources from SAP HANA.
Other “smart” technologies followed the next year with SPS 09 (2014) with Smart Data Streaming, (later Streaming Analytics) based on ESP; Dynamic Tiering (smart data tiering was considered as well), a native big data solution based on IQ; Smart Data Integration (SDI) and Smart Data Quality (SDQ) both BusinessObjects Data Services technologies to address the veracity of big data.
On the Bus
Also in 2013, SAP partnered with HortonWorks (now Cloudera) to resell big data platforms and started the Big Data Tour to get the developer community on the bus.
- SAP Big Data Bus Part Of Overall Effort To Build Developer Community, TechCrunch
- SAP Increases Focus on Developer Experience and Makes Key Open Source Contributions, SAP News (2013)
- SAP Helps Customers Achieve Real-Time Big Data Results With SAP HANA and Hadoop, SAP New (2013)
Hop Aboard the SAP Big Data Bus | Disrupt SF 2013
SAP HANA, the Real-Time Business Platform
Quo Vadis?
The next year, 2014, Spark integration was added plus a certified Spark distribution, causing some question marks about the future direction of SAP (HANA).
- Apache Spark integration with SAP HANA by Balaji Krishna
- SAP HANA gets some Spark from Databricks, Diginomica
- SAP Commits to Cloud Foundry and OpenStack for Innovative Development in the Cloud, SAP News (2014)
Illustration from Bridging two worlds : Integration of SAP and Hadoop Ecosystems
Voracious
Big data integration took one step further with the release of SAP HANA Vora, announced at SAP TechEd 2015.
- SAP HANA Vora Now Available to Bring Contextual Analytics Across All Enterprise and Big Data Systems, SAP News (2016)
The name was later shortened to SAP Vora to underline that this concerned an independent product which not required the SAP HANA platform (see the FAQ for your questions).
Big Data-as-a-Service
In 2016, SAP acquired Altiscale’s Big Data-as-a-Service (BDaaS) solution, integrated as SAP Cloud Platform Big Data Services. Vora was added to the service and this brought more good news.
- SAP Welcomes Altiscale, Provider of High Performance Big-Data-as-a-Service Solution, SAP News (2016)
- SAP Delivers Live Insights from Big Data to Customers, SAP News (2017)
- Independent Research Firm Identifies SAP as a Big Data Warehouse Leader, SAP News (2017)
Intelligent Technologies
SAP Leonardo was introduced at SAPPHIRE NOW 2017 as an innovation portfolio bringing together Internet of Things, machine learning, blockchain, analytics, artificial intelligence, and Big Data technologies.
- What Is SAP Leonardo? SAP News (2017)
- SAP Leonardo: A Closer Look at a Year of Innovation SAP News (2018)
In 2019, again at SAPPHIRE NOW, SAP refocussed on the business and announced the Business Technology Platform (BTP) as successor: the fastest way to turn data into business value (yes, that’s one of the big data V’s).
- Putting Business at the Heart of Tech by Juergen Mueller, SAP News (2019)
Cloud-native, Multi-cloud, and Hybrid
SAP Vora 2.0
For version 2.0, SAP Vora was re-architected to run inside Docker containers with Kubernetes for cluster management, providing customers “the flexibility to choose among cloud, on-premise and hybrid deployment models, and they can migrate between these options easily and with minimal disruption”.
- New Release of SAP Vora Helps Simplify Big Data Transformation and Improve Business Outcomes, SAP News (2017)
SAP Data Hub
SAP Vora was now also included with another new containerised application, SAP Data Hub.
- New SAP Data Hub Tames the Data Landscape, SAP News (2017)
- SAP Vora 2.0 and integration with SAP Data Hub by Balaji Krishna
- SAP Data Hub – a containerized application by Thorsten Schneider
Illustration from What is SAP HANA Cold Data Tiering? by Ruediger Karl
SAP Data Intelligence
In 2019, SAP Data Hub was made available as a managed service with the name SAP Data Intelligence and just recently (March 2020), the on-premise product and the cloud-based service have been merged.
A Single Gateway to All Your Data
SAP HANA Cloud, Data Lake
Just released as well (March 2020) is SAP HANA Cloud. This service includes SAP HANA Cloud, data lake, where we find our old friend IQ at work.
- SAP HANA Cloud, Data Lake (SAP Help Portal)
- What is a Data Lake and Why You Need One by Anthony Karge
- SAP ASE and SAP IQ: The Next Generation by Irfan Khan and Gerrit Simon Kazmaier
SAP HANA Cloud uses the same container and Kubernetes orchestration technologies as Data Intelligence (and Vora).
Smart data access (virtualisation) plays an important role in the design of SAP HANA Cloud and this includes, of course, access to the usual big data source suspects Hadoop and Spark but also to Google Big Query and Amazon Athena.
- Data Access with SAP HANA Cloud, SAP HANA Cloud Administration Guide (SAP Help Portal)
For more information, see
- SAP HANA Cloud – SAP HANA Journey microsite
- Getting Started with SAP HANA Cloud (Free trial)
SAP HANA 2.0 – An IntroductionJust getting started with SAP HANA? Or do have a migration to SAP HANA 2.0 coming up? Need a quick update covering business benefits and technology overview. Understand the role of the system administrator, developer, data integrator, security officer, data scientist, data modeler, project manager, and other SAP HANA stakeholders? My latest book about SAP HANA 2.0 covers everything you need to know. Get it from SAP Press or Amazon: |
Share and Connect
Questions? Post as comment.
Useful? Give a like and share on social media. Thanks!
If you would like to receive updates, connect with me on
- LinkedIn > linkedin.com/in/dvankempen
- Twitter > @dvankempen
For the author page of SAP Press, visit
So check out the finding in the Blog - Q – the easy Installer for SAP IQ to implement SAP IQ and the SAP-NLS Solution for SAP BW based on SAP IQ
Best Regards Roland
Amazing one for the jumpstart, thanks very much.