Skip to Content
Author's profile photo Jason Hinsperger

From the Archives: Loading Data in More Flexible Ways – part deux

In this post, originally written by Glenn Paulley and posted to sybase.com in July of 2009, Glenn discusses using the LOAD TABLE to load data into SQL Anywhere when the data resides on the client.

In my last post on this topic, I described the ability of SQL Anywhere to reference a CSV file directly from within a query using the READ_CLIENT_FILE() function. In this post, I want to describe similar extensions to the LOAD TABLE statement that are supported in Version 11 and up.

Loading Client-Resident Files With LOAD TABLE

The ability of the SQL Anywhere server to callback to a client to retrieve data with READ_CLIENT_FILE() is also available with LOAD TABLE. With previous SQL Anywhere releases, LOAD TABLE could only access files directly accessible from the server machine. Now, with the appropriate authorizations (READCLIENTFILE authority) and enablement (the ALLOW_READ_CLIENT_FILE database option) one can load a file from a client machine directly into a base or temporary table on the server. Following part un, where I described processing a CSV-file generated from my StatCounter account, here’s an example loading the CSV file into a permanent table using LOAD TABLE:


CREATE TABLE visitor_summary (
    dayoftheweek    CHAR(9) NOT NULL,
    record_date    CHAR(30) NOT NULL,
    page_loads      INT NOT NULL,
    unique_visitors INT NOT NULL,
    first_time_visitors INT NOT NULL,
    returning_visitors INT NOT NULL );
LOAD TABLE visitor_summary USING CLIENT FILE 'c:\gpaulley\blog\Summary-6July2009.csv' DELIMITED BY ',' SKIP 1


USING CLIENT FILE does not materialize the contents of the file on the server. Hence the client file can be of arbitrary size.

In Version 11, LOAD TABLE supports more than the loading of files. The syntax for LOAD TABLE includes the USING VALUE clause, enabling one to load data into a table from any expression of CHAR, NCHAR, BINARY, LONG VARCHAR, LONG NVARCHAR, or LONG BINARY type in the identical manner to OPENSTRING. Hence the above LOAD TABLE statement could be written as


LOAD TABLE visitor_summary USING VALUE READ_CLIENT_FILE('c:\gpaulley\blog\Summary-6July2009.csv' ) DELIMITED BY ',' SKIP 1


Loading Data From a Table Column

In Version 11, the LOAD TABLE statement has explicit syntax for loading data from another column in another table, where the table column contains one or more “rows” as a BLOB or CLOB (and hence are limited to 2GB each in size). Here’s an example:


BEGIN
    DECLARE summary_data LONG VARCHAR;
    DECLARE LOCAL TEMPORARY TABLE summary_temp ( summary_line INT NOT NULL PRIMARY KEY, summary_contents LONG VARCHAR NOT NULL NO INDEX) ON COMMIT PRESERVE ROWS;
    SET summary_data = xp_read_file( 'c:\gpaulley\blog\Summary-6July2009.csv' );
    INSERT INTO summary_temp VALUES ( 1, summary_data );
    LOAD TABLE visitor_summary USING COLUMN summary_contents FROM summary_temp ORDER BY summary_line SKIP 1 DELIMITED BY ',' WITH ROW
END

The syntax  is


LOAD TABLE ... USING COLUMN column_name FROM table_name ORDER BY key [ loading options ]

which causes the load to process the values from “table_name.column_name”. The ORDER BY clause is not optional; one must specify a total ordering of the rows in “table_name” by referencing columns that cover a primary key, unique index, or unique constraint of “table_name”. The LOAD processes all of the rows of “table_name” in this order. If we modify the above example to insert multiple rows into the temporary table (ie. duplicating the INSERT on line 14):


BEGIN
    DECLARE summary_data LONG VARCHAR;
    DECLARE LOCAL TEMPORARY TABLE summary_temp ( summary_line INT NOT NULL PRIMARY KEY, summary_contents LONG VARCHAR NOT NULL NO INDEX) ON COMMIT PRESERVE ROWS;
    SET summary_data = xp_read_file( 'c:\gpaulley\blog\Summary-6July2009.csv' );
    INSERT INTO summary_temp VALUES ( 1, summary_data );
    INSERT INTO summary_temp VALUES ( 2, summary_data );
    INSERT INTO summary_temp VALUES ( 3, summary_data );
    LOAD TABLE visitor_summary USING COLUMN summary_contents FROM summary_temp ORDER BY summary_line SKIP 1 DELIMITED BY ',' WITH ROW LOGGING;
END


then 3 duplicate copies of each row of StatCounter summary data will be loaded into “visitor_summary”.

Loading Data and the Transaction Log

One of main performance benefits of using LOAD TABLE over INSERT statements is better execution speed. One of the ways that execution speed is improved is that triggers on the table being loaded do not fire. A second speedup technique is that, by default, the contents of the data being loaded are not written to the transaction log; only the text of the LOAD TABLE statement itself is written to the log. This has several critical implications:

  • If the database is being mirrored as part of a high-availability system, the newly-loaded data cannot be sent to the mirroring server.
  • Similarly, rows loaded using LOAD TABLE are problematic for log-based synchronization (Mobilink or SQL Remote) since the rows themselves do not appear in the transaction log.
  • If recovery of the database is required and the LOAD TABLE statement must be replayed, the original file that was loaded must be available for the server to replay the LOAD TABLE statement from the transaction log. If the file is unavailable, recovery will fail. If the file is different from the original, it is possible for the database to become logically corrupt.

    SQL Anywhere Version 11 offers additional mechanisms to circumvent these issues with LOAD TABLE. The LOAD TABLE statement offers a WITH LOGGING clause to explicitly specify how the statement is to be logged. The possible options are:

    • WITH FILE NAME LOGGING clause. This clause matches the server’s default behaviour when loading server-resident files, which is to cause only the LOAD TABLE statement to be recorded in the transaction log. This level of logging cannot be used when loading from an expression or a client file. When you do not specify a logging level in the LOAD TABLE statement, WITH FILE NAME LOGGING is the default level when specifying:
      LOAD TABLE ... FROM filename-expression
      LOAD TABLE ... USING FILE filename-expression 
    • WITH ROW LOGGING clause. The WITH ROW LOGGING clause causes each row that is loaded to be recorded in the transaction log as an INSERT statement. This level of logging is recommended for databases involved in synchronization, and is supported in database mirroring. However, when loading large amounts of data, this logging type can impact performance, and results in a much longer transaction log.

      This level is also ideal for databases where the table being loaded into contains non-deterministic values, such as computed columns, or CURRENT TIMESTAMP defaults.

    • WITH CONTENT LOGGING clause. The WITH CONTENT LOGGING clause causes the database server to chunk together the content of the rows that are being loaded. These chunks can be reconstituted into rows later, for example during recovery from the transaction log. When loading large amounts of data, this logging type has a lower impact on performance compared to logging each individual row, and offers increased data protection. Nonetheless, using WITH CONTENT LOGGING does result in a longer transaction log. This level of logging is recommended for databases involved in mirroring, or where it is desirable to not maintain the original data files for later recovery.

      The WITH CONTENT LOGGING clause cannot be used if the database is involved in synchronization.

      When you do not specify a logging level in the LOAD TABLE statement, WITH CONTENT LOGGING is the default level when specifying:

      LOAD TABLE … USING CLIENT FILE client-filename-expression
      LOAD TABLE … USING VALUE value-expression
      LOAD TABLE … USING COLUMN column-expression

Assigned Tags

      Be the first to leave a comment
      You must be Logged on to comment or reply to a post.