From the Archives: Loading Data in More Flexible Ways – part deux
In this post, originally written by Glenn Paulley and posted to sybase.com in July of 2009, Glenn discusses using the LOAD TABLE to load data into SQL Anywhere when the data resides on the client.
In my last post on this topic, I described the ability of SQL Anywhere to reference a CSV file directly from within a query using the
READ_CLIENT_FILE() function. In this post, I want to describe similar extensions to the
LOAD TABLE statement that are supported in Version 11 and up.
Loading Client-Resident Files With LOAD TABLE
The ability of the SQL Anywhere server to callback to a client to retrieve data with
READ_CLIENT_FILE() is also available with
LOAD TABLE. With previous SQL Anywhere releases,
LOAD TABLE could only access files directly accessible from the server machine. Now, with the appropriate authorizations (
READCLIENTFILE authority) and enablement (the
ALLOW_READ_CLIENT_FILE database option) one can load a file from a client machine directly into a base or temporary table on the server. Following part un, where I described processing a CSV-file generated from my StatCounter account, here’s an example loading the CSV file into a permanent table using
CREATE TABLE visitor_summary ( dayoftheweek CHAR(9) NOT NULL, record_date CHAR(30) NOT NULL, page_loads INT NOT NULL, unique_visitors INT NOT NULL, first_time_visitors INT NOT NULL, returning_visitors INT NOT NULL ); LOAD TABLE visitor_summary USING CLIENT FILE 'c:\gpaulley\blog\Summary-6July2009.csv' DELIMITED BY ',' SKIP 1
USING CLIENT FILE does not materialize the contents of the file on the server. Hence the client file can be of arbitrary size.
In Version 11,
LOAD TABLE supports more than the loading of files. The syntax for
LOAD TABLE includes the
USING VALUE clause, enabling one to load data into a table from any expression of
LONG NVARCHAR, or
LONG BINARY type in the identical manner to
OPENSTRING. Hence the above
LOAD TABLE statement could be written as
LOAD TABLE visitor_summary USING VALUE READ_CLIENT_FILE('c:\gpaulley\blog\Summary-6July2009.csv' ) DELIMITED BY ',' SKIP 1
Loading Data From a Table Column
In Version 11, the
LOAD TABLE statement has explicit syntax for loading data from another column in another table, where the table column contains one or more “rows” as a BLOB or CLOB (and hence are limited to 2GB each in size). Here’s an example:
BEGIN DECLARE summary_data LONG VARCHAR; DECLARE LOCAL TEMPORARY TABLE summary_temp ( summary_line INT NOT NULL PRIMARY KEY, summary_contents LONG VARCHAR NOT NULL NO INDEX) ON COMMIT PRESERVE ROWS; SET summary_data = xp_read_file( 'c:\gpaulley\blog\Summary-6July2009.csv' ); INSERT INTO summary_temp VALUES ( 1, summary_data ); LOAD TABLE visitor_summary USING COLUMN summary_contents FROM summary_temp ORDER BY summary_line SKIP 1 DELIMITED BY ',' WITH ROW END
The syntax is
LOAD TABLE ... USING COLUMN column_name FROM table_name ORDER BY key [ loading options ]
which causes the load to process the values from “table_name.column_name”. The
ORDER BY clause is not optional; one must specify a total ordering of the rows in “table_name” by referencing columns that cover a primary key, unique index, or unique constraint of “table_name”. The
LOAD processes all of the rows of “table_name” in this order. If we modify the above example to insert multiple rows into the temporary table (ie. duplicating the
INSERT on line 14):
BEGIN DECLARE summary_data LONG VARCHAR; DECLARE LOCAL TEMPORARY TABLE summary_temp ( summary_line INT NOT NULL PRIMARY KEY, summary_contents LONG VARCHAR NOT NULL NO INDEX) ON COMMIT PRESERVE ROWS; SET summary_data = xp_read_file( 'c:\gpaulley\blog\Summary-6July2009.csv' ); INSERT INTO summary_temp VALUES ( 1, summary_data ); INSERT INTO summary_temp VALUES ( 2, summary_data ); INSERT INTO summary_temp VALUES ( 3, summary_data ); LOAD TABLE visitor_summary USING COLUMN summary_contents FROM summary_temp ORDER BY summary_line SKIP 1 DELIMITED BY ',' WITH ROW LOGGING; END
then 3 duplicate copies of each row of StatCounter summary data will be loaded into “visitor_summary”.
Loading Data and the Transaction Log
One of main performance benefits of using
LOAD TABLE over
INSERT statements is better execution speed. One of the ways that execution speed is improved is that triggers on the table being loaded do not fire. A second speedup technique is that, by default, the contents of the data being loaded are not written to the transaction log; only the text of the
LOAD TABLE statement itself is written to the log. This has several critical implications:
- If the database is being mirrored as part of a high-availability system, the newly-loaded data cannot be sent to the mirroring server.
- Similarly, rows loaded using
LOAD TABLEare problematic for log-based synchronization (Mobilink or SQL Remote) since the rows themselves do not appear in the transaction log.
If recovery of the database is required and the
LOAD TABLEstatement must be replayed, the original file that was loaded must be available for the server to replay the
LOAD TABLEstatement from the transaction log. If the file is unavailable, recovery will fail. If the file is different from the original, it is possible for the database to become logically corrupt.
SQL Anywhere Version 11 offers additional mechanisms to circumvent these issues with
LOAD TABLE. The
LOAD TABLEstatement offers a
WITH LOGGINGclause to explicitly specify how the statement is to be logged. The possible options are:
WITH FILE NAME LOGGINGclause. This clause matches the server’s default behaviour when loading server-resident files, which is to cause only the
LOAD TABLEstatement to be recorded in the transaction log. This level of logging cannot be used when loading from an expression or a client file. When you do not specify a logging level in the
WITH FILE NAME LOGGINGis the default level when specifying:
LOAD TABLE ... FROM filename-expression LOAD TABLE ... USING FILE filename-expression
WITH ROW LOGGINGclause. The
WITH ROW LOGGINGclause causes each row that is loaded to be recorded in the transaction log as an
INSERTstatement. This level of logging is recommended for databases involved in synchronization, and is supported in database mirroring. However, when loading large amounts of data, this logging type can impact performance, and results in a much longer transaction log.
This level is also ideal for databases where the table being loaded into contains non-deterministic values, such as computed columns, or
WITH CONTENT LOGGINGclause. The
WITH CONTENT LOGGINGclause causes the database server to chunk together the content of the rows that are being loaded. These chunks can be reconstituted into rows later, for example during recovery from the transaction log. When loading large amounts of data, this logging type has a lower impact on performance compared to logging each individual row, and offers increased data protection. Nonetheless, using
WITH CONTENT LOGGINGdoes result in a longer transaction log. This level of logging is recommended for databases involved in mirroring, or where it is desirable to not maintain the original data files for later recovery.
WITH CONTENT LOGGINGclause cannot be used if the database is involved in synchronization.
When you do not specify a logging level in the
WITH CONTENT LOGGINGis the default level when specifying:
LOAD TABLE … USING CLIENT FILE client-filename-expression
LOAD TABLE … USING VALUE value-expression
LOAD TABLE … USING COLUMN column-expression