From the Archives: Loading Data in More Flexible Ways – Part Un
In this post, originally written by Glenn Paulley and posted to sybase.com in July of 2009, Glenn discusses using the OPENSTRING clause in a select statement to load data into SQL Anywhere.
Invariably one needs to transfer the contents of a flat file into, or, sometimes, out of, a SQL Anywhere server. Prior to Version 11, importing a flat file into the server could be done in the following ways (without resorting to writing code to do so):
- using the
LOAD TABLE
statement; - using the
INPUT
statement from within the DBISQL tool; - loading using the xp_read_file() procedure; or
- establishing a proxy table to the flat file using Remote Data Access services.
Each of the above solutions have trade-offs. The first two options require the creation of a table in which to load the data. LOAD TABLE
is faster than INPUT
, but suffers from two disadvantages: firstly, that the file must be directly accessible from the server machine, and secondly, the individual rows being inserted are not recorded in the transaction log, complicating recovery. The third option also requires that the file be local to the server machine. Unfortunately xp_read_file() creates an opaque value that (still) requires parsing. The fourth option, a proxy table, is even more cumbersome to use, requiring the possible creation of an external server, an EXTERNLOGIN
object, etc.
With Version 11, SQL Anywhere handles interactions with flat files more flexibly. In this post, I’ll describe the functionality provided by OPENSTRING
, and I’ll follow this with a subsequent post on additional options.
OPENSTRING
To illustrate these new features, I wanted to upload into the server the CSV output file generated by StatCounter for my blog. Here are the first few lines of the file (I opened a StatCounter account on 20 January of this year):
Day,Date,Page Loads,Unique Visitors,First Time Visitors,Returning Visitors
Tuesday,20th January 2009,"83","48","46","2"
Wednesday,21st January 2009,"163","108","102","6"
Thursday,22nd January 2009,"127","105","99","6"
Friday,23rd January 2009,"126","91","87","4"
Saturday,24th January 2009,"42","37","35","2"
Sunday,25th January 2009,"52","38","36","2"
Monday,26th January 2009,"171","133","119","14"
Tuesday,27th January 2009,"157","110","101","9"
Rather than use a proxy table, LOAD TABLE
, or INPUT INTO
, I decided to avoid creating a table at all, and simply refer to the flat file directly in my SELECT
statement using OPENSTRING
.
OPENSTRING
is a table expression whose input parameter can be either a flat file or a (string) variable. When specified the server parses the file input and constructs rows of a virtual table matching the schema that one specifies in a clause of the OPENSTRING
expression. Here is the SQL grammar for an OPENSTRING
table expression.
<openstring-expression> ::= OPENSTRING ( { FILE | VALUE } <string-expression> ) WITH ( <rowset-schema> ) [ OPTION ( <scan-option> ... ) ]
<rowset-schema> ::= <column-schema-list> | TABLE [owner.]table-name [ ( column-list ) ]
<column-schema-list> ::= { <column-name> <user-or-base-type> | FILLER( ) } [ , ... ]
<column-list> ::= { <column-name> | FILLER( ) } [ , ... ]
<scan-option> ::= BYTE ORDER MARK { ON | OFF }
| COMMENTS INTRODUCED BY <comment-prefix>
| DELIMITED BY <string>
| ENCODING <encoding>
| ESCAPE CHARACTER <character>
| ESCAPES { ON | OFF }
| FORMAT { TEXT | BCP }
| HEXADECIMAL { ON | OFF }
| QUOTE <string>
| QUOTES { ON | OFF }
| ROW DELIMITED BY <string>
| SKIP integer
| STRIP { ON | OFF | LTRIM | RTRIM | BOTH }
Note that the options for OPENSTRING
match those available for LOAD TABLE
. In my case, the Excel-CSV file uses comma-delimited fields, and the first line contains the attribute names which must be ignored for processing the actual values. Here is a SQL statement that creates a result set from this input flat file directly:
SELECT stat_weekday,
CAST ( ( REGEXP_SUBSTR( str_stat_date, '[0-9]+(?=(st|nd|th|rd|ST|ND|TH|RD )\s.*)' )
|| REGEXP_SUBSTR( str_stat_date, '(?<=[0-9]+(st|nd|th|rd|ST|ND|TH|RD)).*' ) ) AS DATE ) as stat_date,
page_loads, unique_visitors, first_time_visitors, returning_visitors
FROM OPENSTRING( VALUE READ_CLIENT_FILE ('c:\gpaulley\blog\summary-6July2009.csv' ) )
WITH( stat_weekday char(10), str_stat_date char(30), page_loads int, unique_visitors int, first_time_visitors int, returning_visitors int)
OPTION( SKIP 1 DELIMITED BY ',' ) AS summary
Some points to make about the above statement:
- The parameter to
OPENSTRING
is eitherVALUE
orFILE
. IfFILE
is specified, the flat file must be local to the server machine; in this case, the file was on my own computer. Version 11 supports theREAD_CLIENT_FILE
function that uses SQL Anywhere’s CMDSEQ wire protocol to fetch a file’s contents from the client machine.READ_CLIENT_FILE
creates an internal string thatOPENSTRING
subsequently parses to create the rows. - The
WITH
clause specifies the schema of the file. Conversions to server data types from the strings in the file are performed automatically. However, because the CSV file generated by StatCounter contained ordinal date values (eg. ’21st January 2009′) the date values are parsed as strings. - To generate a modified string that SQL Anywhere can convert to a
DATE
, I used theREGEXP_SUBSTR
function, another new feature of Version 11 that accompanies SQL Anywhere’s regular expression support. Here, the first usage ofREGEXP_SUBSTR
returns the truncated, numeric, day of the month, using a positive lookahead zero-width assertion. The second instance ofREGEXP_SUBSTR
, which is similar, uses a positive look-behind zero-width assertion to return the rest of the string. When concatenated, the two functions convert ’21st January 2009′ to ’21 January 2009′ and the server can handle that conversion automatically, with an appropriate choice (‘DMY’) of theDATE_ORDER
connection option. - Using
READ_CLIENT_FILE
requires two things:- The database must be enabled for client file access by enabling the
ALLOW_READ_CLIENT_FILE
option. This option can be set only by a user with DBA authority. - A user invoking
READ_CLIENT_FILE
must have theREADCLIENTFILE
authority.
- The database must be enabled for client file access by enabling the
Here is a screen shot of the query and its result set:
OPENSTRING
is a table expression that can appear in any DML statement – including INSERT
and MERGE
– and can reference any string variable in the query’s scope. Moreover, one can utilize OPENSTRING
in a view. Once the view definition has been established, one can then create INSTEAD OF
triggers so that procedures or applications using the flat file from the view can seamlessly issue update DML statements against the view.