Here is my third and last part of the SAP Sapphire 2015 coverage. I apologize for delay but SAP Sapphire is very intensive experience and it was impossible for me to find time required to consolidate all information for the blogs (each blog took me around 3-5 hours).
Here is the cross-link section how to get to individual blog entries:
Update 19.5.2015 – keynote replay was finally made available – so here is my summary of keynote itself
Keynote from Hasso Plattner was divided into main four parts.
Part One – From Vision to Reality
First and biggest part was focused on looking back on how the road to SAP HANA and S/4 HANA started.
Hasso outlined that idea started in 2006 at Hasso Plattner Institute (HPI) by project to rethink how ERP system would look like if it would be based on database with zero response time. As part of theoretical exercise students came to the conclusion that all aggregates and other constructs required to accelerate performance are redundant and can be removed. Such new approach would also bring dramatic footprint reduction and significant application code simplification as code could focus only on application logic and did not need to focus on maintaining redundancies. Here the students voted to change the objective from building new ERP system to building new database.
After this introduction Hasso started to talk about the importance of speed and how it can help business to redesign their processes to be more successful, he also quoted CEO of Walmart saying that it’s all about speed and who does not realize that now will lose in the future. Hasso said that SAP is well aware of this paradigm shift and therefore they are releasing S/4 HANA. He also mentioned that there are more than 2000 startup betting their life on SAP HANA and that can be seen as kind of crowd voting that SAP HANA is right solution.
Returning back to history – in 2008 Hasso and the team of students created Cash Forecast Prototype application on top of their in-memory database prototype where Hasso said that at this point he realized that systems of future need to be more then systems of record looking into past – systems need to be looking into the future – predicting future.
Next part of keynote was focused on main aspects of developing applications tailored for speed (on top of SAP HANA) where Hasso outlined most problematic areas where SAP HANA can deliver speed but which are frequently overlooked during development:
- Massively Parallel Processing (not being nature of application programmers)
- Partitioning and Replication (columnar databases being different then row databases)
- Simplified Data Model (no need for redundant data)
- Data Footprint Reduction (by columnar compression and by removing redundancies)
- Structured and Unstructured Data (both now possible in one system)
- Transactional and Analytic Workload Reunited (no need to have two separated system because of performance reasons)
Here is screen-shot as presented during the keynote:
Then Hasso introduced his new book and made public announcement that if anyone will read the book and will not understand what SAP HANA is good for then Hasso himself will refund the book (even if received for free). Then he shortly introduced the content of the book.
Book introduction and announcement can be also seen here: https://www.youtube.com/watch?v=wQHtxUK9tz0
In next part Hasso explained his vision he had in 2009 for the boardroom of the future – one meeting room for all key executive functions – CEO, CFO, head of sales, head of products, etc. – all of them getting real-time overview of whole corporation with option to deep dive into any topic they are just discussing – so that there is no need for pre-made PowerPoint presentations anymore which are static and inflexible.
Here is screen-shot as presented during the keynote:
Then he presented SAP boardroom of the future that is being used in SAP and challenged customers to start building their own boardrooms. Hasso outlined that this is the new way how companies will be led in future.
Vision for boardroom of the future was complemented by demo presented by Mike Crowe, CIO at the Colgate-Palmolive Company showing how they execute their smarter and faster business reviews.
Boardroom part of keynote can be also seen separately here: https://www.youtube.com/watch?v=_Y2Oz5KORg0
Next demo was focused on retailers solving two stock specific issues in food distribution – expiration date getting close versus food shortage. Demo was able to find, propose and rank most effective options how to address the issue including actions like reschedule of the order to different date, increasing other sales orders, doing marketing activities or donations. Demo was leveraging innovations delivered by Simple Finance and Simple Logistics.
This demo can be seen here: https://www.youtube.com/watch?v=rAK3T0fTBEk
At the end of this part Hasso challenged SAP to work as quickly as possible to re-implement all areas to run with SAP Fiori and said that all other UI technologies will be removed over the time.
Part Two – Design Thinking in Co-Innovation Projects
In this part Hasso outlined importance of Design Thinking and that it is important to start the software development process from very beginning – by understanding and observing the user – phases which were missing in the software development process and which are now being fully embraced by SAP and heavily used in co-innovation projects with customers.
This approach led SAP to build completely different set of applications. One example of such application helping to fight cancer disease was demonstrated by Prof. Dr. Christof von Kalle, director of Translation Oncology at the National Center for Tumor Diseases in Heidelberg, Germany.
This demo can be also seen separately on this video: https://www.youtube.com/watch?v=6vYg2u6wvOQ
The demo was followed by speech of Dr. Peter Yu, President of American Society of Clinical Oncology explaining their challenge on how to bring all individual medical records about individual patients together to create shared system from which additional information might be extracted.
Part Three – Performance Measurements
This part started with discussion about Data Footprint Reduction where Hasso used example of existing US customer having 54 TB of data in their SAP ERP system. According to the presentation this would need hardware with total memory volume of 96 TB for active and another 96 TB for failover system – hardware costs might be estimated to be around 20.000.000 USD in list prices (prices are here only to illustrate the proportions).
Here is screen-shot as presented during the keynote:
Just by putting the SAP ERP system on SAP HANA the data volume came down to 9.5 TB of data. This could be hosted on one 24 TB server for active and another 24 TB for failover system – decreasing the estimated costs 4x to around 5.000.000 USD in list prices.
Putting the same volume of data on S/4 HANA system (which is removing unnecessary redundancies) would decrease the memory requirements to 5 TB of data (3 TB of active data and 2 TB of historical data). If historical data is placed on cheaper HW then we could reach the overall hardware costs of around 550.000 USD in list prices.
Key takeaway was that S/4 HANA system which is correctly partitioned in active and historical data can dramatically reduce data footprint (up to 10x in example above) and could also decrease the hardware costs (up to 40x in example above).
In next part of keynote Hasso described the speed challenge SAP received from different customer and outlined that SAP was able to win this challenge by using SAP HANA. Customer was able to report above massive volumes of the data (200 billion data entries) with very low response times as illustrated on the graph presented during the keynote.
Here is screen-shot as presented during the keynote:
Last demo in this part was focused on data partitioning and how it can help customers to further accelerate their solutions – this demo is described in detailed in next section (see below).
Part Four – ??? (I missed the official title)
In this last part of keynote Hasso introduced new CTO of SAP – Quentin Clark and handed over the word to him.
Quentin speech was focused on SAP HANA Cloud Platform – saying that it needs to be simple and open platform so that it could be easily used for any development required and easy to integrate with external systems, with business networks and with mobile devices.
SAP goal is to build new ecosystems that will be using SAP HANA Cloud Platform as base for their solutions. Their cloud platform will be enabled to data from IoT (Internet of Things), Streaming, Social, Application systems and Analytical systems. SAP HANA Cloud Platform will be built on top of SAP HANA database, Sybase ASE, Sybase Anywhere and Hadoop where SAP HANA will be integrating these products together.
Other announcements included:
- Intel Haswell announcement (which actually happened on day 1)
- Lenovo record-breaking benchmark on SAP HANA
http://events.sap.com/sapphirenow/en/session/15904 (part between 12:40 and 15:01)
- IBM releasing first solution for SAP HANA on Power systems
Keynote was closed by last demo showing how SAP HANA Cloud Platform can be used to build mobile enabled application for rewarding employees with job points and how this can be easily integrated with business networks for spending these points on „real world rewards“.
Demo can be seen here separately: https://www.youtube.com/watch?v=fOo1omyzCt8
JobPts (Job Points) application is available here:
Whole keynote can be replayed here: http://events.sap.com/sapphirenow/en/session/16024
SAP HANA – Impact of Data partitioning (by HPI)
One part of Hasso Plattner’s Keynote was dedicated to HPI research on Data Partitioning.
Introduction from Hasso Plattner
Hasso introduced this part by his explanation that data partitioning for columnar databases is following different principles then for row databases. Main difference is that columnar databases are mostly doing full or partial attribute vector scans where it is extremely critical (from performance perspective) that whole column or column partition is fully available in memory (otherwise the scanning operation will be significantly slowed down by reading from disk).
Hasso outlined that data tiering (hot-warm-cold principles) which are being developed can drastically degrade performance if active data being accessed by application are not in memory. He also stressed that database cannot correctly anticipate which data will be requested by application and therefore database is not able decide what data to keep in memory – that it must be application proactively specifying which data is required to keep in memory.
Introduction from Hasso Plattner (my thought on this subject)
As coincidence I discussed this topic with fellow SAP Mentors just one day before the keynote – I must admit I was incorrectly arguing that this decision logic needs to be pushed down into a database – reason for this statement was that otherwise every application would have to be “tailored for partitioning” (where partitioning might be also seen as kind of data tiering) – having in mind usability point of view where I believed that it would be good to keep SAP HANA simple and fully transparent to the application.
After hearing the explanation from Hasso I have no other choice then to agree to performance argument – if application would be asking even one single entry which is on different column partition maybe even not loaded into memory – then that particular column partition would have to be completely loaded from disk and scanned. In such case query processing would be slowed down by disk read operation and any positive effect of partitioning would be gone (as additional partitions will be scanned anyway).
So the my conclusion based on what was said is that either all data should be completely in memory or application needs to be “tailored forpartitioning (as kind of data tiering)” and deciding on logic which data should be in which partition.
Question that remains is if those partitions which are containing data that are rarely used could be stored on “cheaper hardware” which would bring the price down knowing that any access to this data would be slow. Or if even those rarely used partitions need to be on premium hardware “parked on disk” and loaded only when required.
Demo on Data Tiering by Carsten Meyer from HPI
In second part of this demo Carsten Meyer took word and explained setup which was used as part of this demo.
Since I was confused by the details of the setup I went to HPI booth after keynote and talked to Martin Boissier from HPI who was very kind to clarify the missing parts. In the description below I will combine what was described during the keynote, HPI project description (from HPI pages) with explanations from Martin.
Demo was working with two independent SAP HANA databases. All queries were executed against both databases at the same time and response time and system load was evaluated and compared to illustrate the performance difference between the two configurations.
Here is screen-shot from the HPI demo dashboard (as presented during the keynote):
First system (colored as blue being on left side) was traditional deployment – data stored in one SAP HANA database in one single partition running one server having 120 cores.
Second system (colored as green being on the right side) was “custom” deployment which needs more detailed explanation. One SAP HANA database was deployed to run on three separate servers (scale-out) where instance on each node was limited to use only certain amount of cores (50+50+20) so that total amount of cores is same as for the first system.
Reason for this “divide” between the three servers was to achieve much cheaper setup – based on assumptions that smaller systems are generally cheaper (per core) then bigger system.
Data was stored over three partitions – each partition on different physical server. First partition (called as MASTER) was containing active data and was used for OLTP workload with read and write operations.
Second partition (called REPLICA) was “copy” of MASTER partition based on something similar to table replication (see Administration Guide for more details) and was used for read-only OLAP queries. If you know HANA you might be surprised by what I just wrote and this was part where I was confused as such setup is not something available today. Here Martin explained to me that they used “customized” code of SAP HANA – and it makes perfect sense – after all this is research project – so there is no reason to be limited to what happens to be available today as part of globally available code.
However it is equally important to understand that this is not something available today out of the box. I did not see SAP HANA roadmap session as it was completely packed and security did not let me in – but I would guess this might come in next SAP HANA releases.
Third partition (called HISTORICAL) was containing infrequently used data – however it was still running as traditional in-memory partition (so no technology like Dynamic Data Tiering was used in this case).
Demo was executed by simulating OLTP (transactional) and OLAP (analytic) workload with predefined amount of concurrent users. Dashboard was visualizing average response time (in ms) and system load (in %). When amount of concurrent users was increased then it was clear that partitioned system was able to deal with the workload with around 2,5x smaller response time then single partition setup.
Second part of the demo was based on using Colgate dashboard (presented before this demo). When action was taken against the already overloaded system then measured response time for this action was clearly better on partitioned system.
Demo on Data Tiering by Carsten Meyer from HPI (my thought on this subject)
I believe it is important to correctly understand the message. The “magic” why second (green) setup was faster was because data was correctly partitioned. Following factors were in play (as also explained at HPI project description):
- OLTP (transactional) workload was faster because it was working only on first partition (so less data had to be processed during the operation)
- OLAP (analytic) workload was faster because it was running against replica – so there was no collision between analytic and transactional workload (which tend to compete for resources and are running better if separated on different cores)
Reasons for putting partitions on different servers were following:
- Partitions were defined to be relatively independent – you run the query either against the actual data (there you stay within one partition) and result is fast or against actual and historical data together where you accept some performance hit by having to use inter-server communication
- Using smaller servers is generally cheaper (per core) then using super-sized servers – therefore scale-out setup is expected to be cheaper (this can also be seen as direction on how to properly address ever increasing data volumes where server requirements might grow above the size of one server)
Key messages from this keynote demo were following:
- If you partition correctly then you can reach much better performance with same resources
- If you leverage “partition replication” (hopefully available soon) you can separate OLTP (transactional) and OLAP (analytic) workload and get even better performance
- If you perform previous points correctly then you might place partitions on different nodes to reach lower costs without any significant degradation in performance
I am personally concerned that adding third point directly from the beginning might be tricky part. Reason for saying this is that I am afraid that some attendees might miss the key points and directly jump to invalid conclusion that “scale-out is faster and cheaper then single-node”.
Therefore I will try to clarify that by saying following:
- It is wrong to understand the demo as that “scale-out is faster than single-node” – on the contrary scale-out is typically slower then single-node because of inter-node communication that is happening over the network – scale-out CAN be as fast as single-node only in case that data is correctly partitioned and inter-node communication is kept on minimum level (as on example above) – additionally separating OLTP (transactional) and OLAP (analytic) workloads to different nodes might make it faster – but this might happen only in very carefully designed setup with proper partitioning as hard prerequisite
Open questions (thoughts)
I wonder how the result of the demo would look like if the second setup would not be scale-out system but single-node system – where each partition would be pinned to particular set of cores (not allowed to touch any other cores).
Such setup should also separate OLTP (transactional) and OLAP (analytic) workloads and avoid any inter-node communication. It would be interesting to understand what the real effect of going scale-out is with such setup (quantified difference between three partitions on one node versus three partitions on three nodes).
Replay of SAP Keynote part – “Data Replication with SAP HANAto Increase Workload Capacity”
HPI page on SAP Sapphire demo (summary):
HPI page of the project details:
Thank You for SAP Sapphire 2015
SAP Sapphire was exciting and I was happy to be able to participate at this event. I would like to express big thanks to everyone I met during SAP Sapphire 2015 and to SAP for making this event and for supporting SAP Mentors program. On top of that I would like to express special thanks to (alphabetically by company): Detlef Poth (Intel), Antonio Freitas (EMC), Martin Boissier (HPI), Markus Winter (SAP), Peter Schinagl (SUSE) and Bob Goldsand (VMware). Last but not least I would like to thank to everyone who spent time reading my SAP Sapphire 2015 blogs. 😉