Article 4 in a 5 part series
In this article I”ll illustrate and contrast two basic mechanisms for moving data between a local database and a HANA cloud database. One mechanism is the PowerBuilder pipeline replication method. The other mechanism is a batch export/import method using DBMS tools. The purpose of this exercise to (1) illustrate a DBMS tool based way to initialize a HANA dataset (2) demonstrate a pure PowerBuilder data replication play and (3) contrast the two mechanisms. There are, of course, many more sophisticated ways to rapidly more data about between systems.
PowerBuilder Data Pipelines
Based on DataWindow technology, a PowerBuilder Pipeline is a flexible replication facility. You can employ it to replicate data, “in-app” or in the IDE, within a single database, across databases, or even across DBMSs. You can code Pipelines to run in the foreground or in the background to copy or refresh datasets. A pipeline has two parts, Part 1 is a Pipeline Data Object which itself contains two definitions (1) DML specifying which data to replicate and how to retrieve it and (2) a DDL / DML definition specifying where that data should go and how to put it there (as inserts or updates) and Part 2 a pipeline class, instantiated as a runtime object which has methods to control and interact with a pipeline data object. If you are familiar with creating PowerBuilder DataWindow Objects, you will be right at home with Data Pipelines. For those seeking more detail, Pipelines are well described in chapter 17 of the online PowerBuilder User Guide
In this video I’ll show you how to use the PowerBuilder IDE to define and execute a Pipeline object that replicates data from SAP SQLAnyWhere to HANA
HANA Storage Types
RDBMS’ designed for OLTP store data in a row oriented structure. This structure is great for quick inserts, updates and deletes of entire rows, but not so great for aggregating column data in large data sets. Conversely, data warehouse storage is optimized for OLAP. Data is stored in a column oriented structure. This structure is great for quickly aggregating column data in large data sets but not so great for inserts, updates and deletes for entire rows as well as join operations. HANA supports both storage structures. The developer selects a storage structure when creating a table (it can be changed later). Lars Breddemann provides this guidance “As a rule of thumb you may start with column-store tables and change them to row-store tables when you encounter performance issues”. See the full discussion here
Tool based batch data initialization
A standard way to batch transfer data is via CSV files. I used Sybase Central’s ISQL tool to export SQL Anywhere data to CSV and HANA Studio to import CSV into a table I defined during the import process. Here’s a video in which I run you through one possible path
Comparing the methods
Looking at figure 1, you can discern the icons that differentiate between row and column storage tables. A PowerBuilder Pipeline used to create a HANA table will always result in a Row Store structure. This is because the DBMS agnostic generated CREATE table syntax does not specifying a value for <table_type>.
See the Create Table syntax in the online SAP HANA Reference for more details. HANA Studio (SQLScript) allows you to change a table to column store when editing its definition. Pipelines do support HANA column storage mechanism, when used to refresh data in in a previously created HANA column store table.
There is a major time difference between the two mechanisms. I transferred a single dataset of 151600 rows using both mechanisms: The CSV method took 4 minutes. Break down is: Export to CSV in ISQL tool 2 min. Import to HANA from local file using HANA Studio 2 min (+ a few minutes to define the table in the wizard) The Pipeline method took over 40 min (not including time to define pipeline object and write pipeline runtime code) That’s a least a 10x factor slower.
Sometimes speed isn’t everything. I recently needed to move a large volume of text based message data up to HANA for analysis. The data was provided to me in a SQL AnyWhere database. I exported over 1 million rows to a CSV file and attempted to use HANA Studio to import the CSV into a table. The CSV import wizard refused to process the CSV file. I selected the file, set up the table create parameters and clicked Next to go to the mapping wizard. The tool would think about it for a few moments (doing some disk I/O) and then return control to the same page. It would not take me to the mapping page! At first I thought it was the file size (3.2 Gig) so I created a 1000 row extract and tried again with the same negative result. Upon I examined the data, I understood why the wizard was failing: The text columns were loaded with diacritical and punctuation symbols. (aka commas and quotes). Straight CSV would never work. Too bad the Studio wizard didn’t have error messaging, it would have saved me lots of time and effort. I went to my trusty fallback position and created and used a PowerBuilder pipeline. After piping I changed the table to a column store and added the indexes I needed. Job done.
Here’s a screen shot of my in-app custom replication UI
Although slower, pipelines do have one attractive feature; they can be built into a running application. A pipeline can be designed to silently run on a timer as a background process, intervallicly refreshing a centralized dataset. Figure 1 is shows the UI of the custom in-app replication tool I built into the SSSS Reporting app.
In this video I’ll video show your my SSSS in-app pipeline in operation. I’ll also walk you through the application code.
1) Watch this video in which I’ll walk you through the SSSS database schema
2) Connect the PowerBuilder IDE to HANA with the user SSSS
3) Create and execute pipeline objects to create HANA tables and replicate data for the MEATS and FAVORITES tables in the SSSS schema
4) Use SQL Anywhere 12’s ISQL tool to export the data in the ORDER_NUMBERS and SANDWICHES tables to CSV files
5) Use HANA Studio’s import tool to create the ORDER_NUMBERS and SANDWICHES tables in the SSSS schema and import data into them from your CSV files