GigaOM Research recently published an interview with database rock star Michael Stonebraker on “the impending battle of the database elephants,” covering his thoughts on the disruption in the database market.
This blog includes the excerpts I thought were most interesting:
SAP enters the database market
“In the OLTP market, recent advances have completely convinced me that main memory database systems … are going to completely take over“
“The database market is really alive, vibrant, with lots of new ideas, and I think the legacy vendors face the “innovator’s dilemma” in spades”
“SAP is in the database business and SAP customers are Oracle’s biggest customer right now, and among the elephants there’s going to be a duke it out between Oracle and SAP and I’m delighted to look on from the side.”
Legacy databases are obsolete
“I think data warehouses are an SQL market. It’s just there’s the new way to do it and the old way to do it, and the legacy vendors have the old way to do it. In OLTP, I think it’s a SQL market also, and the legacy vendors have the old way and there’s a new, much better way.
In round numbers, the database market is a third OLTP, a third data warehouses and a third everything else, and I think “everything else” is primarily a non-SQL market. I think in datawarehouses and in OLTP, it will remain a SQL market, it’s just the implementations have to change from what they are now to better ideas.
The codebases that the elephants, the legacy vendors are selling right now are 25 years old. And it’s time for them to be retired and sent to the home for obsolete software!“
Modern ideas behind HANA
“My expectation is that SAP will make a compelling case for their SAP customers switching off of Oracle and onto HANA. That case has not been made yet, it’s way to early. The real thing to watch is how SAP customers are going to react to persuasion from SAP to switch database systems.”
“I’ve looked at the ideas [behind HANA], and I think the ideas are good. They are modern ideas. It’s too soon to whether the implementation will hold up to the ideas. My suspicion is that it deserves to be taken seriously, and that it will have a very large elephant pushing it very hard”
Gap between NoSQL and SQL narrowing
“My favorite way to categorize the NoSQL guys is that they started off as “NoSQL,” meaning “SQL is bad.” After a while, that turned into NoSQL meaning “not only SQL” – SQL was fine, and they wanted to co-exist with SQL systems. My prediction is that NoSQL will come to mean “not yet SQL.”
The two things the NoSQL guys say is number one, “don’t use SQL, instead use low-level record-at-a-time language.” Cassandra and Mongo have both announced what looks like – unless you squint—a high level language that is basically SQL. I think the NoSQL guys will move to putting higher-level languages on their products, and thereby make the difference between NoSQL and SQL get much smaller.
I also think that the second thing is that they don’t like ACID. The biggest proponent of NoSQL non-ACID has been historically a guy named Jeff Dean at Google, who is responsible for most or all of their database offerings. And he and the team recently wrote a system called Spanner. Spanner is a pure ACID system. So Google is moving to ACID and I think the NoSQL market will move away from eventual consistency and toward ACID, and so I think the distinction between the two camps will decrease in the future.
“There’s been 40 years of DBMS research, starting way back in the 70s. This was a huge debate in the 70s in the relational database research world, and if you go back and look at the history in the 70s, all the discussion today of ACID vs non-ACID all got wrangled out back then. The NoSQL engineers didn’t… You know, “If you don’t pay attention to history, you’re going to have to repeat it,” which I think is what’s happening.
Did Oracle take good care of TimesTen?
“That’s a technical question… I and some others wrote a paper called “OLTP through the looking-glass, and what we found there.”
We took an open source legacy DBMS called Shore that is from university of Wisconsin. And we said “suppose all the data fits in main memory?” If you have a terabyte of data or less, or maybe even these days two or three or five terabytes, it’s perfectly reasonable to put that in main memory.
So we ran the industry standard benchmark, which is TPC-C, on data with a buffer pool big enough to hold all the data. And then we said “where do all the cycles go?”
The answers were a little bit shocking: less than 10% goes into useful work – meaning actually solving the SQL command that comes in. The other 90+% went to four different places [….]
TimesTen was architected in the 90s[…] it’s got three of the four big pieces of overhead, and so its ability to go blindingly fast is really compromised. And so I think the question is not “has any particular system been well taken care of?” or not, it’s more than that any system written more than six or eight years ago didn’t realize where all the overhead is going, and wasn’t architected in a way that goes blindingly fast.”
The full interview
You can listen to the full interview on Google Soundcloud (The Stonebraker portion starts at 18:20)