Details on the Sybase IQ 16 World Record…
We have had so much interest in the new Guinness World record, set by SAP Sybase IQ 16 for the fastest loading of Big Data, that I thought I would share some further details with you all…
This test of SAP Sybase IQ 16 was run on a standard HP ProLiant DL980 G7 server in an 80 core system (8 processors, 10 cores per processor) with 1TB RAM running Red Hat RHEL 6.2. The EDMT analytics archive from SAP partner BMMsoft, certified with SAP Sybase IQ 16, was used to ingest and query the following data:
- 6.5 million Documents: Records containing a document and metadata about the document. The document portions of the records were made of images, video clips, audio clips and office type documents.
- 500 million transactions: Records produced as a result of conventional transaction processing. These records are made of traditional data types such as numbers and character strings.
- 5 million short messages: Records produced by capturing short text messages. These records were made of a combination of email messages and SMS type messages.
The complete load process included 50 batches of data. Each batch was made of 16 files containing raw data representing the various data types outlined and totaled above.
The test recorded a load speed of 34.3TB/hour (loading and indexing 33.13TB of raw data in 57.91 min). Queries were executed throughout the load process to verify the accessibility of the newly ingested data.
This record is significantly faster than published claims by Oracle for Exadata (12 TB/hr- June 2012) and EMC for their Greenplum-based appliance (10TB/hr- Jan 2013) and SAP holds the only independently verified test for Big Data loading. 😀
Once again – congratulations team, way to go!
1TB RAM? I am interested to know whether all or how many percentage the 1TB will be used for the data loading for this test?
Hilda,
The memory was divided evenly between main, temp, and large memory with about 250gb each. Memory use, though, was minimal. When loading blob/clob data it is only kept in memory until the cache is full and needs to be flushed so the pages can be reused. With so much unstructured data going in, the blob/clob structures get written to disk and are pretty much out of memory in short order to loads for other rows can be done.
We did play around with memory allocations but found that so long as we didn't starve main and temp cache that the loads performed about the same.
It is also important to note that majority of data loaded into IQ during these tests was unstructured. A lot of structured data was loaded, don't get me wrong, but there was even more unstructured data.
Hope this helps.
Mark
Thanks Mark's reply. You said the data loaded are maily unstructured data, therefore 1TB RAM is needed in this case. Previously from our IQ sizing guide, we normally will suggest 16 or 32 GB RAM per core. May I know that the latest sizing guide, would it be 100GB per core?
Hilda,
The system specification was not clear (it has been changed above). Apologies for that. The system is an 8 socket, 10 core per socket configuration for a total of 80 cores. That's just 12 GB RAM per core. More than enough for the workload at hand.
My current sizing recommendations for most situations is 12-16 GB RAM per core. For non-RLV systems, I recommend 12 GB RAM per core. If using the new in-memory RLV store, I recommend 16 GB per core.
Hope this helps.
Thanks for your update. It make more sense for 80 core with 1TB RAM setup. Thanks for the clarification.