In this post, originally written by Glenn Paulley and posted to sybase.com in July of 2009, Glenn discusses using the OPENSTRING clause in a select statement to load data into SQL Anywhere.

Invariably one needs to transfer the contents of a flat file into, or, sometimes, out of, a SQL Anywhere server. Prior to Version 11, importing a flat file into the server could be done in the following ways (without resorting to writing code to do so):

  • using the LOAD TABLE statement;
  • using the INPUT statement from within the DBISQL tool;
  • loading using the xp_read_file() procedure; or
  • establishing a proxy table to the flat file using Remote Data Access services.

Each of the above solutions have trade-offs. The first two options require the creation of a table in which to load the data. LOAD TABLE is faster than INPUT, but suffers from two disadvantages: firstly, that the file must be directly accessible from the server machine, and secondly, the individual rows being inserted are not recorded in the transaction log, complicating recovery. The third option also requires that the file be local to the server machine. Unfortunately xp_read_file() creates an opaque value that (still) requires parsing. The fourth option, a proxy table, is even more cumbersome to use, requiring the possible creation of an external server, an EXTERNLOGIN object, etc.

With Version 11, SQL Anywhere handles interactions with flat files more flexibly. In this post, I’ll describe the functionality provided by OPENSTRING, and I’ll follow this with a subsequent post on additional options.

OPENSTRING

To illustrate these new features, I wanted to upload into the server the CSV output file generated by StatCounter for my blog. Here are the first few lines of the file (I opened a StatCounter account on 20 January of this year):


Day,Date,Page Loads,Unique Visitors,First Time Visitors,Returning Visitors
Tuesday,20th January 2009,"83","48","46","2"
Wednesday,21st January 2009,"163","108","102","6"
Thursday,22nd January 2009,"127","105","99","6"
Friday,23rd January 2009,"126","91","87","4"
Saturday,24th January 2009,"42","37","35","2"
Sunday,25th January 2009,"52","38","36","2"
Monday,26th January 2009,"171","133","119","14"
Tuesday,27th January 2009,"157","110","101","9"

Rather than use a proxy table, LOAD TABLE, or INPUT INTO, I decided to avoid creating a table at all, and simply refer to the flat file directly in my SELECT statement using OPENSTRING.

OPENSTRING is a table expression whose input parameter can be either a flat file or a (string) variable. When specified the server parses the file input and constructs rows of a virtual table matching the schema that one specifies in a clause of the OPENSTRING expression. Here is the SQL grammar for an OPENSTRING table expression.


<openstring-expression> ::= OPENSTRING ( { FILE | VALUE } <string-expression> ) WITH ( <rowset-schema> ) [ OPTION ( <scan-option> ...  ) ]
<rowset-schema> ::= <column-schema-list> | TABLE [owner.]table-name [ ( column-list ) ]
<column-schema-list> ::= { <column-name> <user-or-base-type> | FILLER( ) } [ , ... ]
<column-list> ::= { <column-name> | FILLER( ) } [ , ... ]
<scan-option> ::= BYTE ORDER MARK { ON | OFF }
        | COMMENTS INTRODUCED BY <comment-prefix>
        | DELIMITED BY <string>  
        | ENCODING <encoding>
        | ESCAPE CHARACTER <character>
        | ESCAPES { ON | OFF }
        | FORMAT { TEXT | BCP }
        | HEXADECIMAL { ON | OFF }
        | QUOTE <string>
        | QUOTES { ON | OFF }
        | ROW DELIMITED BY <string>
        | SKIP integer
        | STRIP { ON | OFF | LTRIM | RTRIM | BOTH }

Note that the options for OPENSTRING match those available for LOAD TABLE. In my case, the Excel-CSV file uses comma-delimited fields, and the first line contains the attribute names which must be ignored for processing the actual values. Here is a SQL statement that creates a result set from this input flat file directly:


SELECT stat_weekday,
      CAST ( ( REGEXP_SUBSTR( str_stat_date, '[0-9]+(?=(st|nd|th|rd|ST|ND|TH|RD )\s.*)' )
                    || REGEXP_SUBSTR( str_stat_date, '(?<=[0-9]+(st|nd|th|rd|ST|ND|TH|RD)).*' ) ) AS DATE ) as stat_date,
      page_loads, unique_visitors, first_time_visitors, returning_visitors
FROM OPENSTRING( VALUE READ_CLIENT_FILE ('c:\gpaulley\blog\summary-6July2009.csv' ) )
    WITH( stat_weekday char(10), str_stat_date char(30), page_loads int, unique_visitors int, first_time_visitors int, returning_visitors int)
        OPTION( SKIP 1 DELIMITED BY ',' ) AS summary

Some points to make about the above statement:

  • The parameter to OPENSTRING is either VALUE or FILE. If FILE is specified, the flat file must be local to the server machine; in this case, the file was on my own computer. Version 11 supports the READ_CLIENT_FILE function that uses SQL Anywhere’s CMDSEQ wire protocol to fetch a file’s contents from the client machine. READ_CLIENT_FILE creates an internal string that OPENSTRINGsubsequently parses to create the rows.
  • The WITH clause specifies the schema of the file. Conversions to server data types from the strings in the file are performed automatically. However, because the CSV file generated by StatCounter contained ordinal date values (eg. ’21st January 2009′) the date values are parsed as strings.
  • To generate a modified string that SQL Anywhere can convert to a DATE, I used the REGEXP_SUBSTR function, another new feature of Version 11 that accompanies SQL Anywhere’s regular expression support. Here, the first usage of REGEXP_SUBSTR returns the truncated, numeric, day of the month, using a positive lookahead zero-width assertion. The second instance of REGEXP_SUBSTR, which is similar, uses a positive look-behind zero-width assertion to return the rest of the string. When concatenated, the two functions convert ’21st January 2009′ to ’21 January 2009′ and the server can handle that conversion automatically, with an appropriate choice (‘DMY’) of the DATE_ORDER connection option.
  • Using READ_CLIENT_FILE requires two things:
    • The database must be enabled for client file access by enabling the ALLOW_READ_CLIENT_FILE option. This option can be set only by a user with DBA authority.
    • A user invoking READ_CLIENT_FILE must have the READCLIENTFILE authority.

Here is a screen shot of the query and its result set:

/wp-content/uploads/2014/04/regexp_substr_407878.png

OPENSTRING is a table expression that can appear in any DML statement – including INSERT and MERGE – and can reference any string variable in the query’s scope. Moreover, one can utilize OPENSTRING in a view. Once the view definition has been established, one can then create INSTEAD OF triggers so that procedures or applications using the flat file from the view can seamlessly issue update DML statements against the view.

To report this post you need to login first.

Be the first to leave a comment

You must be Logged on to comment or reply to a post.

Leave a Reply