Recently I was playing around with Data Services to capture statistics by processing big volume of data. I know Data Services is very mature and capable of handling massive volume but wanted to see it for myself under various scenarios as my previous experience reminded me that we used Data Services to transfer ~2 billion records from Oracle to HANA which took less than 10 days with complex transformations.
The fact is that Data Services took ~10 mins to transfer 40 Million records from MySQL to SQL Server. I downloaded IMDB interface files and processed it initially using python to decode the data into SQL statements. The generated SQL file was then loaded into MySQL which was around 5GB in size. The result of import was 60 tables with 45 million records into MySQL database. I also had SQL Server instance, so decided to transfer from MySQL to test the performance.I used 3 data flows in series to transfer all 50 tables from MySQL to SQL Server and the total job execution time taken was close to 10 mins which was very quick (Of course there were no complex transforms except for fewer joins). The machine used to perform this activity was configured with 16GB ram and 4 cores.
My next exercise would be to push these data into HANA and build a complex calculation view to get some reports out of IMDB data.