From the Archives: LIKE predicate semantics with blank-padded databases
In this post, originally written by Glenn Paulley and posted to sybase.com in August of 2009, Glenn talks about blank padded character strings and how they behave with respect to the LIKE predicate.
In the SQL:2008 standard, fixed-length character string values are blank padded. Blank padding occurs during storage of a fixed-length character string value when its original size is less than the declared width of the column. Blank padding also occurs when fixed-length character strings are compared using any string comparison predicate. For storage, here is the relevant quote from Section 9.2 (Store assignment), General rule 2(b)(iii), that defines the behaviour for storing value V in column T:
If the declared type of T is fixed-length character string with length in characters L and the length in characters M of V is less than L, then the first M characters of T are set to V and the last M characters of T are set to spaces.
SQL Anywhere, however, does not blank-pad fixed-length character strings. In a SQL Anywhere database, every string is stored as if the column is a
VARCHAR type. This means that all blanks in a string value (trailing or otherwise) are treated as significant characters; hence the value ‘a ‘ (the character ‘a’ followed by a blank) is not equivalent to the single-character string ‘a’. Inequality comparisons also treat a blank as any other character in the collation.
SQL Anywhere offers the ability to mimic ANSI SQL character-string comparison semantics with the “blank padding” option, which can be specified with either the dbinit utility or the
CREATE DATABASE statement. With the blank-padding option enabled, trailing blanks in a string are ignored when being compared. Ignoring trailing blanks has equivalent semantics to blank-padding for equality and inequality (“!=”) operations. However, this behaviour is not identical to blank-padding semantics for other comparison operators such as less than (“<“)
LIKE Semantics With and Without Blank Padding
The semantics of a
LIKE pattern in SQL Anywhere does not change if the database is blank-padded because matching the expression to the pattern involves a character-by-character (or code point by code point, in the case of UTF8 databases) comparison in a left-to-right fashion. No additional blank padding (or truncation) is performed on the value of either expression or pattern during the computation. Therefore, the expression “a1” matches the pattern “a1”, but not the patterns “a1 ” (“a1” with a trailing blank) or “a1_”. These semantic differences occur whenever the expression or the pattern contain trailing spaces, and, as we shall see, also illustrate differences in other relational DBMS products, in virtually all cases due to the retention of legacy behaviour.
To illustrate – my thanks to John Smirnios for the following analysis – we define a table T consisting of four string columns as follows:
CREATE TABLE T( a CHAR(1), b CHAR(2), c CHAR(3), d VARCHAR(10) );
and into table T we insert two rows, the first row with a single character ‘a’ in each column, and the second row with the value ‘a ‘ (an ‘a’ followed by a blank) in each column, as follows:
INSERT INTO T VALUES( 'a', 'a', 'a', 'a' ); INSERT INTO T VALUES( 'a ', 'a ', 'a ', 'a ' );
A test of 12 specific test cases across a variety of database systems yielded the following results:
|Rows returned by “Column LIKE Pattern”|
|Test case||Column||Pattern||SQL Anywhere||Oracle||DB2||ASE||SQL Server||IQ|
- Sybase ASE and Microsoft SQL Server 2005 always strip all trailing blanks from the end of
VARCHARvalues. In these two systems, it is impossible to store a blank at the end of a
- In Sybase IQ, the predicate “d
LIKE‘a[ ]'” returns row 2, even though the predicate “d
LIKE‘a ‘” returns both rows 1 and 2.
- In Sybase ASE, “d
LIKE‘a[ ]'” returns no rows even though the predicate “d
LIKE‘a ‘” returns both rows 1 and 2.
SQL Anywhere treats all strings as
VARCHAR, even in a blank-padded database. For
VARCHAR strings, SQL Anywhere’s behaviour matches DB2 and Oracle and the ANSI/ISO SQL standard (test cases 4, 8, and 12).
DB2 and Oracle have identical semantics. Fixed-width character columns are effectively always padded to their maximum length and the string’s length is always equal to the maximum width of the column. The end of the string must match the end of the pattern.
VARCHAR fields retain any trailing blanks that were inserted by the user; blanks are never added to or stripped from
Sybase ASE appears to strip trailing blanks from the pattern string, but it does not strip ‘equivalent to blank’ expressions (see note 3). However, ASE will retain a single trailing blank in the case of a pattern ending in ‘%’ followed by one or more blanks; this specific exception is documented in the ASE SQL User’s Guide. ASE also effectively strips trailing blanks from the match value and then re-pads
CHAR columns with enough blanks to match the pattern (but not enough to exceed the width of the column). For
VARCHAR match values, blanks are pre-stripped (see note 1) and blanks are never added to allow a match to occur. A pattern ending with an equivalent-to-blank (‘[ ]’) will never match a
Microsoft SQL Server 2005 does not strip trailing blanks from the pattern. However, like ASE, SQL Server appears to strip trailing blanks from the match value and then re-pad
CHAR columns with enough blanks to match the pattern but not enough to exceed the width of the column. Blanks are never appended to a
VARCHAR to allow a match to occur.
In another post, I’ll attempt to outline the differences in semantics with trailing blanks and empty strings with client-server protocols.
My sincere thanks to my colleague John Smirnios for his thorough analysis of
LIKE semantics with SQL Anywhere and these other database management systems.