Loose notes from an accountant’s journey through hardware
I have fond memories of Sanssouci. I was a teenager in the East Berlin’s suburb of Potsdam in 1985 watching Live Aid and my favorite band The Queen with Freddie Mercury moving the crowds. This was quite a treat as we were not allowed to watch “decadent” Western music and the transmission was from West Berlin, which was an occupied, but somehow a free city. In one of the trips we visited the summer residence of Frederick the Great – Sanssouci (French for “No Worries”).
Fast forward 25 years. Now, the residence is on the cover of the book and lends its name to a database – SanssouciDB. This is where the historical connection ends, for me at least, and the main premise of the book starts. The new database will do away with long wait times for the IT customer when trying to get the latest report on sales, profits, and costs and to compare them to the last month’s, quarter’s or year’s numbers. Quite a change and a big promise you might say.
This textbook is actually a proof of concept that has been elaborated by Prof. Plattner in his lectures that are available to everyone in the iTunes U section. The book and the lectures don’t cover each other 100%, but they are pretty close. Let me assure you that the textbook is not a good candidate for speed reading, but rather may have to be read more than once, to dig deeper into RAM programming. Vinnie Mirchandani would agree, I think. In-Memory Data Management is short but packed with the systems vocabulary. Claims and assumptions made by the authors are proven by real customer data with millions of underlying line items stored in GIGA and TERA bytes, currently in disk storage, but in the future and for reporting purposes, in RAM residing on blades that are architected for maximum performance.
A large portion of the book is devoted to exploring the areas of RAM, their hierarchies, and organized algorithms to make most of the scarce computing resources. I have read it and kept checking the terminology against entries in Wikipedia as there are a lot of mathematical concepts, especially around time, that are not intuitive, and may take some time before they become understood. Also, following diagrams with mappings and referencing of internal memory structures didn’t seem obvious to a business process analyst like myself, but for others they may seem obvious, so bear with me as I wander the halls of computing hardware.
Introduction into Enterprise Computing
The most general definition of enterprise computing is that an ERP system helps to move money, materials, and people. Economists have known this as capital and labor resources. I would also add time as there is a lot of discussion about time throughout the book when it goes deep into it when discussing constant, logarithmic, and exponential time. Time turns out the most precious resource consuming both (technology) capital and (idling when waiting for the system to respond) labor. This is not really stated in the book, but that’s the way I see it.
As I sink my teeth deeper into terminology and try to consult the glossary I quickly find out that I’m thankful for having Wikipedia, so reading the book on line is not such a bad idea. Wolfram|Alpha, however, has come up short for me.
The big revolutionary idea comes soon after discussing operational and analytical applications or what we know as OLTP and OLAP. They no longer need to be separate. This is a big idea and along with proposing columns rather than rows being used for storing compressed data in RAM becomes the single most important point authors set out to prove. In other words, operations and analysis will in the end no longer be separate, save for scheduling jobs or a few other areas. Don’t try testing it at home, though as your mortgage may not be enough to cover the expense needed to confirm your real world examples.
Going back to the premise which is very familiar to SAP users world over is that even the most basic questions asked of any measurement system has answers that are slow to come. Queries run longer and longer for the simplest of questions since more records are written to the disks (or persistent stores in more technical terms) and the data simply keeps growing.
What it’s like in your end user customer mind
A long running sentiment of an SAP consultant is echoed when spreadsheets are relegated to be inappropriate tools for consolidation of data sources. In theory you can still do financial planning in SAP, but in reality most users still go back to spreadsheets to perform transformations and calculations and fire up power pivots to start their financial modeling based on instantly available data from SAP and other sources like ledgers, at both operational and consolidated levels. So, I’m not entirely convinced that the new proposed architectures can effectively compete with spreadsheets. This position is actually somewhat reversed at the end of the book as it is shown how power pivots are legitimate clients for data processed by SanssouciDB.
The book’s focus is on the use in the business only, and government and nonprofits are not the target audience. Hopefully, once the promise of in memory is fulfilled in enterprises those non-profit organizations will have the opportunity to benefit from the new approach as well. They, too, are consumers of IT resources, but obviously lack the level of funding that corporate and financial services organizations can allocate to IT.
In one of the most telling illustrations (there are quite a few in the book), ERP applications are depicted as a pyramid with inherent duplication by way of materialized views and cubes on top of normalized tables. The book goes into industry specifics and the way how each would handle their data. I was surprised to learn that discrete manufacturing is quite similar to financial services in terms of financial records updates (both are very low).
The underlying requirement for any ERP or reporting system to be an application that is compliant with local laws left me wanting more as I have not seen comment on integration with tax filing formats, at least here, in the US. The business examples picked for demonstrating the value of In Memory computing are dunning, sales analysis, demand planning, sales orders, and available to promise are all examples for achieving instant feedback.
The Mathematical Maze of Deep Structures
The details of the In Memory architecture are not for the faint of heart and probably target the undergraduate audience very well. For a practitioner like myself they are more of a challenge, so again, I have to thank Wikipedia for being by my side to explain the more arcane terms of hardware architecture and a programming approach that it calls for. I don’t want to attempt to convert the mathematics of In Memory into business terms, so I will only mention some terms, arguably very obvious to a lot of programming folk, but sounding quite mysterious to those who are closer to business. If anything, you can use it later to scale your own computing environment as a lot of conclusions apply not only to enterprise scale landscapes, but also to what you carry in your pocket – your smartphone. If this is too basic for you then you can just skip it and go for the real thing or move to the end of this blog and read the conclusions.
If you are still with me you will get introduced to the deep structures of hardware surrounding your computers’ CPU. First, we will go into the application cache of SanssouciDB where reside 40TB for analytics preaggregates and optimized schemas. Normally, the underlying overall strategy is to duplicate in order to minimize joins. Along the way, we pick up some explanations, e.g. a list is a set of instances (master data). IMDB is In Memory database but since there is no single blade available to keep all this data in one physical place multiple networked high-end blades have to be connected. This is a tradeoff between reorganization of partitioning and distributing queries as a software solution to the cost barrier of hardware currently available in the marketplace.
Network communication overhead is one of the first performance hits we need to settle for in our quest for constant (or real) time. Multiplicity allows for backup or redundancy. With max 50TB shared nothing/shared memory primary data copy under control, which is our test system, we need to assess what kind of storage is available to us. Flash storage for persistence is not as durable as the slower hard drives – this was a new information to me about hardware. To truly appreciate the implications we discuss different kinds of data structures residing in memory: main store, differential store, and indices. They are all needed to conserve space and increase runtime storage and are also needed for persistence or in other words – backup.
Combined columns reference the compressed state. Sorted dictionary differential store helps avoiding recompression – a very time consuming task but minimized by the unsorted periodical moves to the main store for highly selective predicates (key columns). Here, we use inverted indices and data aging to partition out the most current data – data knowledge is still needed – something very dear to us, business analysts, out there. It reminds me of matrix transposition a bit and it still requires new code. This call for new development is best described by this quote “new application development paradigms must be adopted”. So far, we have barely scratched the surface here.
Memory hierarchies or internal architecture of RAM and parallel processing are getting us closer to the CPU, which is reaching its 18- month doubling limit. Data are all variables and other components that should move in constant access time to any memory location, but data is loaded in different caches, and we have to consider cache evictions and misses. The columnar access becomes useful for analytics – but association with neighboring columns helps as well. We are traversing cache levels and dealing with latency, mainly due to requests to transmit to decode memory address and to connect to bus.
Our next obstacles are CPU stalls as we are moving closer to CPU. By using the locality of reference we are still bound to space which is still of importance. The resulting architecture is 8-way associative – for more explanation go to the book. I have spent a considerable amount of time reading about the caches and storage-class memory. My takeaway is that two cache misses occur when crossing one cache line. Further discussion of sequential versus random access is more familiar.
I’m becoming a BASIS expert here
Hypervisor is the key and a bottleneck to virtualization as in addition to compression and parallelism we need to resort to a more innovative use of computing hardware. This is good to know. Next, scalability is defined as adding resources without modification to increase capacity. We take into consideration startup, contention, and skew (long pole). Scale-up or more vertical internal resources is the model for corporate mainframes. Scale-out or horizontally more machines is the model of Google and Facebook.
What follows are some programming strategies that most accountants are not privy to, e.g. start sorting before scanning completes to save time. The basic data operations here are aggregate, join, and merge (concatenate) but uncompressed in-memory data are still too expensive. Lightweight compression access via dictionary encoding – is this some kind of lookup? Using low level instructions we start time travel in constant time. For constant time or null suppression I’m again checking Wikipedia, since the glossary doesn’t have all the terms explained. Constantly looking up the terminology I’m going through run-length encoding, cluster coding, bit compression variable, byte coding, patched frame-of-reference, heavy-weight compression (does this work like a spreadsheet formula?) are all new terms and I’m deep inside and this is the most challenging part of the book.
Real business data are more meaningful to me as it’s my daily bread. The unnamed accounting table (BKPF, BSEG, COEP, we are not told, but I suspect BSEG) fields of amount, DR/CR, date, and discount are all subject to analysis and compression. A short check with cardinality allows us to reach some conclusions and show tangible results. Amounts are least compressible, but indirect compression is best for them. The concluding average compression is FIFTY. Hybrids of OLAP and OLTP offer 400% improvement. Still data access patterns (you need to know your business) are key to avoid unnecessary and expensive joins. Hybrid had fewer analytical misses than transactional data bases.
Back to IT matrix
After a short real life example we are back to deep computing: candidate generation->merging->layout construction ), OLAP benchmarking is coarse by aggregation and query flight – (what is it? time it takes the query to return? – Wikipedia fails here, too). 7% degradation of virtual versus native environment. Now, I really like it and wish I had this textbook in my undergrad class. There’s more. Binary analogies are discussed but what pops in my mind is this one: rows are for line items or data going IN and columns are for reports or intelligence coming OUT. Again, it’s not in the book, but it seems very intuitive to me. A little hardware benchmarking is in order, Intel Xeon E5450 is getting compared to X5650. Also, this progression is presented: tape->disk->flash->RAM are all on the speed/price curve. A word of caution and common sense advice in the middle of this: watch for long running queries and large number of small transactions swamping the system. Historical data is for legal reporting reasons, hence compliance with market and government rules.
I may have to reread this
Key SQL programming takeaways are that aggregations and joins are most popular SQL commands. Schema of tables, indices, and views are for definition and manipulation. My good old friend is back: relational algebra – is it for transposing the matrix? Not sure about that. Just to confirm: insert, select, update, delete are basic database transactions. Full table scan or index but parsing, query plan, and optimization are details of the SQL programming along with precompiled stored procedures. Heap is simple append but sequential reads are sending queries and calling stored procedures.
There is more intelligence at DB level – chains of SQL statements in HEAPS, ordered and hashed collections, tree indices, group-key on compressed dictionary of attribute. Select scan-is behind FROM, aggregate-GROUP BY, JOIN ON and recompaction (?) – remain unexplained. This sounds like good old SWO1: business object encapsulates semantically related data with business logic and rules to manipulate data. Redundancy drives performance, but I already knew that. Modifying the main store directly is prohibitive – good point. Differential buffer is insert only rather than update.
250M BSEG worth of uncompressed line items become compressed to 20GB – over 10-fold decrease. No-overwrite data management reminds me of the accounting principle of consistency and reliability. Multi-version concurrency control or MVCC is explained but doesn’t really sink in for me. Transaction token – no definition. Validity date logic for time travel of queries. Real data analysis yields 1-5 aggregate updates per accounting line item. Totals are materialized views on journal entries in order to facilitate fast response – on the fly. 5% of records changed over 5 years. Simple math here as more memory is needed for insert-only 10.9M open (status only change) +1.4M reconciled=12.3M + 28M original=40M. Total accounting tables number goes from 18 down to 2 (what are the 18 tables? BKPF, BSEG, anybody help here?). Column oriented storage, in-memory storage compression, partitioned storage, and aggregation are all mentioned and explained.
At its simplest, it does the following SELECT col1, col2, SUM (measure col3) FROM col4 GROUP BY col1, col2. Every accountant should understand that. Partition seems to be some kind of a ledger. SUM and COUNT is equivalent of ACCOUNT. Join is binary matching problem and aggregation is unary matching problem but no glossary for either. Hash-based is best for unsorted lists. Distributed aggregation is adding in multiple places. Index hash map via bucket, linear, and cuckoo hashing. 1B rows per CUBE materialized aggregates require business knowledge, so we are still in business. Synchronization is expensive, more storage is needed and more coding for scenarios is needed. Then, we may have no materialized views and reduced need for star schema (needed for BWA) joins on normalized tables are fine but Cartesian product is too expensive.
Daily grind of a systems admin
Downtime can be defined as follows: 99.9% is 8.76 hours down and 99.99% is 52.6 minutes down per year. One maintenance window going bad skews the whole calculation as there are 100+ weekend days in the year. Job scheduling aggregation-CPU intensive (time shared), scans-memory intensive (space shared), for analytical-both thread pool, memory boundedness – no glossary. Two separate pools (OLTP and OLAP?) one thread per core easy. Business part application layer needs to be developed (more work for us). SQL is more efficient than procedural looping application development rules data intensive-database load only what’s needed generalize but only to minimum (not too many unnecessary features). Always, stay close to customer and work with real data work in cross teams.
A good piece of statistics about spreadsheet extensions into Pivot from 275M rows – real BI and Luca Pacioli (Poor Luca doesn’t really belong here as his accounting treatise is not even alluded to) Largest data warehouses triple every 2 years, ETL is most difficult part (oh, yes, I remember the nights spent scrubbing). MOLAP and ROLAP introduced without much lead-in from prior chapter, but nothing really new here, so facts combine measures with dimensions star is cube design takes weeks to build cubes are overblown to compensate for long design and load times surrogate identifiers to compress BEx is still on top of InMemory snowflake is a compromise to reduce redundancy but increases join runtimes. MDX hides joins complexity.
Unproven parallelism in main memory algorithmic trading needs a lot of data quickly, smartphone examples are really good virtual cubes are sufficient views don’t store data but only transformations.
Nice finish with Cloud computing
The authors take an opposite approach to the cheap version of cloud computing propagated by Google and Facebook. Theirs is much more sophisticated and high computing intensive. It’s an area that will quickly become obsolete as it is really state of the art and a buzz word in the IT industry, but I actually liked this chapter as my beloved (or hated depending on your point of view) spreadsheets reappear as client of choice to analytical folks because once you post your transactions it will update the total and a balance on someone’s desktop (phone, iPad, etc) at a split second and they will call asking questions, so you better think twice before saving that 999-line journal entry. In Memory is watching you.
This book nicely summarizes the state of the art of enterprise computing today. The authors are honest about the issues of the current implementations and try to move forward with a new approach to computing: more memory and columnar rather than row architecture. They have developed a working prototype with real business data, so the new software bundled with large hardware, the big promise of real time computing is here. Are you ready?