From the Archives: Using the MERGE statement with SQL Anywhere
In this post, originally written by Glenn Paulley and posted to sybase.com in April of 2009, Glenn provides details on the MERGE statement in SQL Anywhere. The latest documentation on the MERGE statement can be found here.
The MERGE
statement first appeared as Feature F312 in the SQL:2003 ANSI/ISO standard. MERGE
is useful in situations where one has to insert a set of rows but when cleaning or other ETL (Extract-Transform-Load) processing is required. MERGE
permits one to implement quite sophisticated logic to process a set of tuples to be loaded in a single statement. In this article, I’d like to briefly outline the functionality of the MERGE
statement, how it has changed between the SQL:2003 and SQL:2008 standards, and the extensions to MERGE
that are supported in SQL Anywhere. To do this, I’ll extensively borrow from a presentation on the MERGE
statement and materialized views that my colleague Anil Goel presented at the Sybase Techwave conference in Las Vegas in August, 2008.
Example
When processing a set of input rows to be inserted into a base table, it is sometimes necessary to perform “INSERT
-else-UPDATE
” processing, which is precisely the main intent of the MERGE
statement. In addition, it would be beneficial to provide:
- Provide alternate actions for
INSERT
; - Conditionally skip the
INSERT
and/or theUPDATE
for specific rows; DELETE
an existing row; and- Permit arbitrary conditions to be specified for alternate actions.
Suppose we have an existing, non-empty table T and wish to process a set of input rows S as follows:
FOR each input row in S DO
IF EXISTING THEN // row of S matches an existing row in table T
IF condition-1 THEN SKIP
ELSE IF condition-2 THEN ERROR
ELSE IF condition-3 THEN DELETE FROM T
ELSE UPDATE T
ELSE
IF condition-4 THEN SKIP
ELSE IF condition-5 THEN ERROR
ELSE INSERT INTO T
END
This entire piece of logic can be executed as a single MERGE
statement as follows. The example below merges existing table rows from the base table T with rows from an input file on the client machine, which are accessed using Version 11’s OPENSTRING
construction:
MERGE INTO T
USING (SELECT * FROM OPENSTRING( FILE '/usr/paulley/techwave/2008/data1.txt')
WITH( id INT, qty INT, cmt LONG VARCHAR) S) S
ON PRIMARY KEY
WHEN NOT MATCHED AND S.cmt = 'ignore' THEN SKIP
WHEN NOT MATCHED AND S.qty < 0 THEN RAISERROR
WHEN NOT MATCHED AND S.qty = 0 THEN INSERT VALUES( S.id, 1 )
WHEN NOT MATCHED THEN INSERT VALUES( S.id, S.qty )
WHEN MATCHED AND S.cmt = 'ignore' THEN SKIP
WHEN MATCHED AND S.cmt = 'new' THEN RAISERROR
WHEN MATCHED AND T.qty + S.qty
MERGE semantics – briefly
The MERGE
statement defined in the SQL:2003 standard permitted at most one WHEN MATCHED
and at most one WHEN NOT MATCHED
clause.
The semantics of SQL:2003 MERGE
are as follows.
- For each row of target-object, determine if the
MERGE
search-condition evaluates to TRUE for each row of the source-object, to find ‘matching’ rows from the source and target. - Generate an error if more than one matching row is found.
- Add the single matching row, if found, to the set of rows to be processed by the
WHEN MATCHED
clause, if present; - The
WHEN NOT MATCHED
clause, if present, processes rows of source-object that fail to match any target-object row.
With Feature F313, SQL:2008 permits multiple WHEN [NOT] MATCHED
clauses with optional search conditions. In this case, the order of the WHEN
clauses matter: it is the merge action of the first WHEN
clause that satisfies its search condition that is executed. However, even with Feature F313, the standard MERGE
statement isn’t that flexible: the only operation a WHEN MATCHED
clause can perform is an UPDATE
, and the only operation a WHEN NOT MATCHED
clause can perform is an INSERT
. With SQL Anywhere 11, we extended MERGE
statement functionality in several ways, some of these inherited from the older INSERT ... ON EXISTING
statement that was introduced to SQL Anywhere Version 8 in 2002:
- One can use
MERGE
with an input dataset from a client machine using theOPENSTRING
construction; - use an explicit column list to permit column reordering from input to target table;
- use of
WITH AUTO NAME
syntax to do column matching by name; - the
MERGE
search-condition can specifyPRIMARY KEY
as a shorthand to generate a condition based on primary key columns; - several additional
WHEN MATCHED
actions:DELETE
the existing rowRAISERROR
[ error-number ]SKIP
the input rowUPDATE [ DEFAULTS { ON | OFF }
the existing row
- additional
WHEN NOT MATCHED
actions:INSERT
the input rowRAISERROR
[ error-number ]SKIP
the input row
A detailed description of the semantics of the MERGE
statement, along with the semantics of INSERT
, DELETE
, and UPDATE
row-level triggers on the target table, can be found in the SQL Anywhere documentation. Here is the BNF for the MERGE
statement syntax supported by SQL Anywhere 11:
merge ::= "MERGE" "INTO" merge_into_tabterm merge_using
simple_tabterm merge_on_clause merge_op_list
merge_using ::= "USING" | "USING WITH AUTO NAME"
merge_into_tabterm ::= ( tabref | "(" query_expr ")" as identifier
| "(" "WITH" with_list query_expr ")" as identifier ) derived_collist
merge_on_clause ::= "ON" merge_search_cond
merge_search_cond ::= searchcond
| "PRIMARY" "KEY"
| "INDEX" indexname
merge_op_list ::= merge_op { "," merge_op }
merge_op ::= "WHEN MATCHED" "THEN" merge_matched_op
| "WHEN MATCHED" "AND" searchcond "THEN"
merge_matched_op
| "WHEN NOT" "MATCHED" "THEN"
merge_not_matched_op
| "WHEN NOT" "MATCHED" "AND" searchcond
"THEN" merge_not_matched_op
merge_common_op ::= "SKIP"
| "RAISERROR"
| "RAISERROR" integer_or_var
merge_matched_op ::= merge_common_op
| "UPDATE SET" setlist
| "UPDATE"
| "UPDATE" "DEFAULTS" "ON"
| "UPDATE" "DEFAULTS" "OFF"
| "DELETE"
merge_not_matched_op ::= merge_common_op
| "INSERT" "VALUES" "(" ins_vallist ")"
| "INSERT" "(" inscollist ")" "VALUES" "(" ins_vallist ")"
| "INSERT"
ins_vallist ::= { ins_valitem "," } ins_valitem
ins_valitem ::= "DEFAULT" | expression
inscollist ::= { inscolname "," } inscolname
inscolname ::= identifier [ "." identifier ]
Another useful extension that we are considering for a future release is to support outer joins between the source- and target-objects, a construction already supported by Microsoft SQL Server 2008. My thanks to Anil Goel for putting together most of the above material for our presentation at Techwave 2008.