I’m a bit new to this part of the woods, so allow me to introduce myself. My name is Alex Balk, I live in Haifa, Israel and work for IBM. I specialize in High Performance Computing (or “supercomputing” as some call it), distributed systems and pengui— err, Linux. I’ve been doings this stuff for, oh, 10 years or so. Which the equivalent of 2000 years in Internet time. And all this time I’ve been fortunate enough to be doing “customer facing roles”, which is a nice way of saying, I get to deal with lots and lots of different systems, in different business domains – from Banking, through Internet to Aerospace. It’s really great!
So enough about me. I’m here to talk to you about SAP HANA, and what it’s got to do with us HPC guys at IBM.
I’m sure by now you’ve heard about HANA and what it can do for your business. You’ve heard the words like SPEED and IN-MEMORY repeated over and over and I’m sure by now you know you can expect significant speed ups to apps like your BW. But there’s another significant word that’s being thrown around when HANA is mentioned: SCALE. And this word carries some pretty interesting implications that I’d like to bring to your attention. Because that’s exactly what guys like me deal with when we build “supercomputers” – scale. From day one.
The basic idea is really simple: where traditional databases would typically scale UP to bigger machines to support an increasing workload, HANA actually scales OUT, to additional “small” machines. This means several things when you’re faced with growth:
- You don’t need to buy bigger, more expensive hardware
- You don’t need rip and replace the existing hardware
- You add more capacity by adding more hardware
The 4th point is, in fact, what I’d like to focus on. Scaling out means that when you want to grow, you just add another “building block” and get an increase in performance/capacity/both. So when you add a server with 1TB of RAM, you get additional 1TB of RAM and… that’s it. You don’t need to do anything else. Want 4TB? no problem, 4 more 1TB servers. Or 8 of those 512GB servers – depending on what building block you choose to grow with. Simple, right? more servers = more capacity. Thank you, have a nice day.
But as it turns out, there’s just a bit more to it than that. Turns out that since SAP HANA is an in-memory database, there’s this issue of “persisting data to disk”. Which just means that, to avoid data loss in case of power failure, all the in-memory data in HANA is written out to disks once in a (short) while. Combine that with the fact that HANA is a scale-out database, which means any machine that makes out the HANA database serves a portion of the data, and a need for high-availability… and you come up with a simple requirement: shared storage.
Shared storage? Yes, shared storage. Don’t worry, this one’s also simple. You just have a piece of storage (a harddisk, if you will), that all the machines that make up the HANA cluster, can read and write to. Oh, like our IBM/NetApp/EMC box? so what, nothing new here, right?
Well… yes. And no. Yes – there’s nothing new here. We’ve been doing shared storage for… ever. But then, we haven’t been doing scale-out forever. Not beyond high-availability, anyway. And not in traditional IT shops. Here, consider this:
Lets say that you have have 4 HANA machines, each with 512GB RAM. And you have a shared storage device to support this setup. Everything’s working and everyone’s happy. But then, you want to add another 1TB of RAM. You go to your favorite hardware vendor, buy a couple of certified nodes and…. find out that your storage has become a bottleneck. It just can’t support 2 more nodes. And there’s only one things you can do: buy another storage device. Let me say it again: you need to buy another, central storage device to support 2 additional HANA nodes. And when you’ll reach a total of 8 nodes and want to grow again? that’s right – you’ll need yet another central storage device.
Okay, so? what’s the big deal?
Again, it’s fairly simple – it means that your building block isn’t just a server with whatever-amount-of-RAM. It’s a server + storage. And you need to scale-out your servers AND storage when you grow. The bottom line to your project’s budget is that scaling up to the storage barrier costs X and scaling beyond it costs X (servers) + Y (storage). Not to mention the added complexity of managing 2 (or more) central storage devices (inside a single HANA appliance).
But hey, what can you do? traditional storage just doesn’t scale-out. It wasn’t built for this purpose. End of story, thank you, move along.
We’re not done yet 🙂
Remember I told you I’m an HPC guy? and that I deal with scaling and stuff like that? well check this out:
You really should know, supercomputers aren’t all that complicated. It’s just a bunch of machines that work together to solve a problem. And all of these machines are pretty much the same. The thing that complicates stuff is that there’s a lot of these simple machines. As in hundreds or thousands of them. And typically, they all work against the same data. So whatever holds the data, has to be able to scale. Which… is kinda like what we have here with HANA. You might’ve also heard of hadoop – it’s dealing with a similar set of problems (specifically in this case, scale and limitations of central storage). So maybe we could reuse something from the worlds of HPC and Hadoop to solve this?
Yes we can. Distributed file systems.
Distributed file systems. Essentially, we’re getting rid of the “central” in “central storage” and replacing it with “distributed”. How? well, instead of using a bunch of disks connected to a central device, we put a lot of disks inside the HANA machines, and use those.
Wait, WHAT?! you’re using local disks?!
Yes. That’s exactly what we’re doing. We’re using local disks, inside each and every one of the HANA nodes. And a software layer on top of that (called “GPFS“) to create a single, shared, distributed file system on top of them. So all of the HANA nodes see a single large filesystem, spanning all those internal disks.
But… but what happens if a disk fails?
Oh, that’s easy. We replicate every single block of data 3 times. We can do more, 3 places us in good company. 🙂
But, but…. but what if a whole machine fails?
Remember we mentioned replication? it’s across different machines. It doesn’t matter if a single disk fails or a whole node goes down – data replication has us covered. In fact, in large clusters, we can even decide replication is done across different cabinets, or parts of the datacenter, to have even better protection.
Okay, umm… but how does this solve the scalability issue exactly?
Ah, easy: there’s no more central storage. The HANA server becomes both a compute AND storage node. When you add a machine to the HANA cluster, you get not only additional RAM for the database, you also get the supporting storage space and throughput. You don’t need central storage anymore.
So to grow I just add servers and that’s it?
And if I want 10 more TB of RAM?
You just add more servers.
And this actually works out in the wild?
Yes, it does. And in Israel too. It even powers the largest HANA cluster in the world:
But I’m not that big – my system is really small.
Oh you don’t have to be huge to hit the scalability limitation of central storage. With some setups, you’ll meet it as soon as 2TB. With others, even sooner. Whatever your initial setup size is, building the solution to scale from the start would allow you to avoid the issue altogether.
And while some might think that 640K is enough for everyone, data growth is usually a good thing. Because it typically means your business is growing. And at the end of the day, that’s what we’re all here for 🙂
Want to learn more? here’s the official IBM landing page for SAP HANA.
Got a question or comment? drop me a line in the comments section below! THANKS!