The new 'Data Transfer Utility' in ASE 15.5

former_member182090 · ‎02-02-2010

(originally published at www.sybase.com in February 2010; moved to SCN in December 2012)

By now you've probably heard about the recently release version 15.5 of ASE. The main feature of ASE 15.5 is its In-Memory Database feature, for delivering 'Extreme Performance' (more information is here). However, there is another interesting feature in ASE 15.5 which is worth mentioning: the so-called 'Incremental Data Transfer Utility'. This feature offers a new way of extracting data from an ASE table, which is much faster than can be achieved by classic methods, like BCP-out. Moreover, it's not just faster than BCP, but also it offers more functionality.

The Data Transfer Utility comes in the form of a new T-SQL command transfer table. When you run this against an ASE table (which must reside in an IMDB or RDDB -- those two new database types introduced in ASE 15.5), it extracts newly inserted rows from the table and writes these into a file. Now, 'newly inserted' means: rows that were inserted since the last time transfer table was run against this same table. This is why it's called the 'incremental' Data Transfer Utility: every time you run transfer table, it will pick up only those latest inserted rows:

1> transfer table my_table to '/opt/mydir/mytab.1.out' for ase

2> go

(24509 rows affected)

1> transfer table my_table to '/opt/mydir/mytab.2.out' for ase

2> go

(831 rows affected)

Apart from inserted rows, transfer table also extracts rows that were updated since the previous invocation of transfer table (more on updated rows below). ASE guarantees that it will never extract the same row twice -- except when the row is updated after it has been previously extracted, in which case it will then be extracted again. Also, ASE will only extract rows from committed transactions.

On the other hand, there is no guarantee on the order in which the rows will come out. This is because transfer table does not use any indexes to find new rows, but instead relies on a hidden timestamp in every row. This allows transfer table to achieve pretty impressive throughput: internal tests at Sybase showed it was possible to extract data at 145 GByte/hour with transfer table while, at the same time, data was BCP'd into the same tables at 110 GByte/hour. That's much more than you'd ever be able to achieve by having your application maintain a column to indicate a 'new' row. Also, you wouldn't be able to get such throughput with BCP alone -- especially because BCP would not be able to extract only the newly inserted rows every time.

transfer table is most suitable for cases where data comes in at high rates into an IMDB, and needs to be extracted for persistence or processing into another system or database, for example, into Sybase IQ. In fact, transfer table can generate its output in two formats ('csv' and 'iq') which can be used directly with the IQ 'load table' command. This is where the 'for' clause, at the end of example above, comes in. This clause specifies how the data is formatted in the output file. There are 4 options here:

for ase: internal binary format, only to be used when data will be imported into ASE again (with transfer table my_table from 'filename...')

for bcp: equivalent to BCP's native (binary) format

for csv: equivalent to BCP's platform-independent ASCII format; can also be used with IQ's load table...format bcp command

for iq: can be used directly with IQ's load table...format binary command

In case you're thinking ahead, you can't use transfer table as a replacement for Replication Server. First, deleted rows are not picked up by transfer table; second, the order of the rows is not guaranteed by transfer table, unlike Replication Server, which maintains the original commit order. In addition, RepServer has all kinds of functionality --like function strings, autocorrection, selective replication-- that transfer table doesn't have. In general, transfer table

aims at much simpler data movement use cases than what Replication Server tends to be used for.

Note that, for transfer table to work incrementally, the source table must have been created (or altered) with a new property enabled:

create table my_table (...columns...) with transfer table on

or

alter table my_table set transfer table on

This property adds the hidden timestamp to each row that is used by transfer table. In fact, transfer table will work even without enabling this property -- but it will extract all rows from the table, and do so every time it is run (so it's not very incremental).

Lastly, on the issue of extracting updated rows: in the output file, you cannot actually tell whether a row was extracted because it was inserted or updated. When copying extracted data into ASE (with transfer table...from) or into IQ (with load table), a row be interpreted as an update only when the primary key for that row already exists; this requires specific indexes to exist to identify the columns to be used as primary key, like a unique clustered index in ASE.

There's a lot more to say about the Data Transfer Utility a.k.a.transfer table. Watch for a whitepaper to appear at www.sybase.com/ASEextreme soon.

The new 'Data Transfer Utility' in ASE 15.5

Get Your SAP HANA Idea Incubator Badge Today!

SCN Mission - SAP HANA Quiz Challenge is now retired

Share your #HANAStory and Win