From the Archives: Loading Data in More Flexible Ways – part deux
In this post, originally written by Glenn Paulley and posted to sybase.com in July of 2009, Glenn discusses using the LOAD TABLE to load data into SQL Anywhere when the data resides on the client.
In my last post on this topic, I described the ability of SQL Anywhere to reference a CSV file directly from within a query using the READ_CLIENT_FILE()
function. In this post, I want to describe similar extensions to the LOAD TABLE
statement that are supported in Version 11 and up.
Loading Client-Resident Files With LOAD TABLE
The ability of the SQL Anywhere server to callback to a client to retrieve data with READ_CLIENT_FILE()
is also available with LOAD TABLE
. With previous SQL Anywhere releases, LOAD TABLE
could only access files directly accessible from the server machine. Now, with the appropriate authorizations (READCLIENTFILE
authority) and enablement (the ALLOW_READ_CLIENT_FILE
database option) one can load a file from a client machine directly into a base or temporary table on the server. Following part un, where I described processing a CSV-file generated from my StatCounter account, here’s an example loading the CSV file into a permanent table using LOAD TABLE
:
CREATE TABLE visitor_summary (
dayoftheweek CHAR(9) NOT NULL,
record_date CHAR(30) NOT NULL,
page_loads INT NOT NULL,
unique_visitors INT NOT NULL,
first_time_visitors INT NOT NULL,
returning_visitors INT NOT NULL );
LOAD TABLE visitor_summary USING CLIENT FILE 'c:\gpaulley\blog\Summary-6July2009.csv' DELIMITED BY ',' SKIP 1
USING CLIENT FILE
does not materialize the contents of the file on the server. Hence the client file can be of arbitrary size.
In Version 11, LOAD TABLE
supports more than the loading of files. The syntax for LOAD TABLE
includes the USING VALUE
clause, enabling one to load data into a table from any expression of CHAR
, NCHAR
, BINARY
, LONG VARCHAR
, LONG NVARCHAR
, or LONG BINARY
type in the identical manner to OPENSTRING
. Hence the above LOAD TABLE
statement could be written as
LOAD TABLE visitor_summary USING VALUE READ_CLIENT_FILE('c:\gpaulley\blog\Summary-6July2009.csv' ) DELIMITED BY ',' SKIP 1
Loading Data From a Table Column
In Version 11, the LOAD TABLE
statement has explicit syntax for loading data from another column in another table, where the table column contains one or more “rows” as a BLOB or CLOB (and hence are limited to 2GB each in size). Here’s an example:
BEGIN
DECLARE summary_data LONG VARCHAR;
DECLARE LOCAL TEMPORARY TABLE summary_temp ( summary_line INT NOT NULL PRIMARY KEY, summary_contents LONG VARCHAR NOT NULL NO INDEX) ON COMMIT PRESERVE ROWS;
SET summary_data = xp_read_file( 'c:\gpaulley\blog\Summary-6July2009.csv' );
INSERT INTO summary_temp VALUES ( 1, summary_data );
LOAD TABLE visitor_summary USING COLUMN summary_contents FROM summary_temp ORDER BY summary_line SKIP 1 DELIMITED BY ',' WITH ROW
END
The syntax is
LOAD TABLE ... USING COLUMN column_name FROM table_name ORDER BY key [ loading options ]
which causes the load to process the values from “table_name.column_name”. The ORDER BY
clause is not optional; one must specify a total ordering of the rows in “table_name” by referencing columns that cover a primary key, unique index, or unique constraint of “table_name”. The LOAD
processes all of the rows of “table_name” in this order. If we modify the above example to insert multiple rows into the temporary table (ie. duplicating the INSERT
on line 14):
BEGIN
DECLARE summary_data LONG VARCHAR;
DECLARE LOCAL TEMPORARY TABLE summary_temp ( summary_line INT NOT NULL PRIMARY KEY, summary_contents LONG VARCHAR NOT NULL NO INDEX) ON COMMIT PRESERVE ROWS;
SET summary_data = xp_read_file( 'c:\gpaulley\blog\Summary-6July2009.csv' );
INSERT INTO summary_temp VALUES ( 1, summary_data );
INSERT INTO summary_temp VALUES ( 2, summary_data );
INSERT INTO summary_temp VALUES ( 3, summary_data );
LOAD TABLE visitor_summary USING COLUMN summary_contents FROM summary_temp ORDER BY summary_line SKIP 1 DELIMITED BY ',' WITH ROW LOGGING;
END
then 3 duplicate copies of each row of StatCounter summary data will be loaded into “visitor_summary”.
Loading Data and the Transaction Log
One of main performance benefits of using LOAD TABLE
over INSERT
statements is better execution speed. One of the ways that execution speed is improved is that triggers on the table being loaded do not fire. A second speedup technique is that, by default, the contents of the data being loaded are not written to the transaction log; only the text of the LOAD TABLE
statement itself is written to the log. This has several critical implications:
- If the database is being mirrored as part of a high-availability system, the newly-loaded data cannot be sent to the mirroring server.
- Similarly, rows loaded using
LOAD TABLE
are problematic for log-based synchronization (Mobilink or SQL Remote) since the rows themselves do not appear in the transaction log. -
If recovery of the database is required and the
LOAD TABLE
statement must be replayed, the original file that was loaded must be available for the server to replay theLOAD TABLE
statement from the transaction log. If the file is unavailable, recovery will fail. If the file is different from the original, it is possible for the database to become logically corrupt.SQL Anywhere Version 11 offers additional mechanisms to circumvent these issues with
LOAD TABLE
. TheLOAD TABLE
statement offers aWITH LOGGING
clause to explicitly specify how the statement is to be logged. The possible options are:WITH FILE NAME LOGGING
clause. This clause matches the server’s default behaviour when loading server-resident files, which is to cause only theLOAD TABLE
statement to be recorded in the transaction log. This level of logging cannot be used when loading from an expression or a client file. When you do not specify a logging level in theLOAD TABLE
statement,WITH FILE NAME LOGGING
is the default level when specifying:LOAD TABLE ... FROM filename-expression LOAD TABLE ... USING FILE filename-expression
-
WITH ROW LOGGING
clause. TheWITH ROW LOGGING
clause causes each row that is loaded to be recorded in the transaction log as anINSERT
statement. This level of logging is recommended for databases involved in synchronization, and is supported in database mirroring. However, when loading large amounts of data, this logging type can impact performance, and results in a much longer transaction log.This level is also ideal for databases where the table being loaded into contains non-deterministic values, such as computed columns, or
CURRENT TIMESTAMP
defaults. -
WITH CONTENT LOGGING
clause. TheWITH CONTENT LOGGING
clause causes the database server to chunk together the content of the rows that are being loaded. These chunks can be reconstituted into rows later, for example during recovery from the transaction log. When loading large amounts of data, this logging type has a lower impact on performance compared to logging each individual row, and offers increased data protection. Nonetheless, usingWITH CONTENT LOGGING
does result in a longer transaction log. This level of logging is recommended for databases involved in mirroring, or where it is desirable to not maintain the original data files for later recovery.The
WITH CONTENT LOGGING
clause cannot be used if the database is involved in synchronization.When you do not specify a logging level in the
LOAD TABLE
statement,WITH CONTENT LOGGING
is the default level when specifying:LOAD TABLE … USING CLIENT FILE client-filename-expression
LOAD TABLE … USING VALUE value-expression
LOAD TABLE … USING COLUMN column-expression