From the Archives: Snapshot Isolation and Materialized Views
In this post, originally written by Glenn Paulley and posted to sybase.com in February of 2011, Glenn talks about the effect of using snapshot isolation in combination with materialized views.
Snapshot isolation and materialized views are two important features that have been part of the SQL Anywhere server since the Version 10 release, which first shipped in September 2006. In the past, I’ve written articles that explain the tradeoffs of using snapshot isolation and presented material on the space-time tradeoffs of materialized views.
In one article on snapshot isolation, I wrote:
Of course, snapshot isolation doesn’t come for free. It is necessary for the database system to construct archive copies of changed data in anticipation of new snapshot transactions. With SQL Anywhere, copies of snapshot rows are managed automatically, written to the temp file (which grows on demand) as necessary. However, though the management impact is near zero, query performance can suffer as snapshot rows may need to be fetched individually from the snapshot row store in the temp file, based on the snapshot semantics of the transaction. The degree of performance degradation depends entirely on the application and its workload, and will be worse with update-intensive workloads.
In this article, I’ll describe the interaction between snapshot isolation and materialized views, particularly as it pertains to the SQL Anywhere server’s transaction log, and discuss the inherent tradeoffs.
Refreshing materialized views
With deferred maintenance materialized views, modifications to the materialized view’s underlying base tables proceed without any additional locking or (immediate) maintenance overhead. Put another way, update transactions modify the values or rows of base tables, and upon
COMMIT these changes are made persistent. However, in this situation the materialized view contents then become stale; whether or not an SQL statement can exploit the stale contents of the materialized view is controlled by setting options for that query’s connection. To refresh the view’s contents, one issues the
REFRESH MATERIALIZED VIEW statement on that view, which effectively performs a
TRUNCATE TABLE on the base table containing the view, and then immediately performs an
INSERT ... FROM SELECT to re-populate the materialized view.
On the other hand, with immediately-maintained materialized views a
REFRESH MATERIALIZED VIEW statement is required only to initially populate the view. After that, any modifications to underlying base tables are correspondingly applied to the immediately-maintained materialized view(s) that refer to those base tables within the same transaction. When the transaction completes, either the transaction performs a
COMMIT to make the changes permanent, or issues a
ROLLBACK to undo them.
Consider the following simple, contrived example – we create a simple, single-table immediately-maintained materialized view over the Products table in the sample DEMO database:
CREATE MATERIALIZED VIEW groupo.shirt_products( prod_id, prod_name, prod_description, prod_size, prod_color, prod_quantity, prod_unit_price) AS SELECT "id", "name", "description", "size", color, quantity, unitprice FROM Products WHERE "name" LIKE '%shirt%'
To make the view immediately-maintained, we first define a unique index on the table instantiating the view:
CREATE UNIQUE INDEX products ON groupo.shirt_products (prod_id ASC);
and then specify that the view should be maintained immediately:
ALTER MATERIALIZED VIEW groupo.shirt_products IMMEDIATE REFRESH
and, finally, initialize the materialized view:
REFRESH MATERIALIZED VIEW groupo.shirt_products
With the creation of the materialized view now complete, we start a new transaction and modify some of the underlying rows of the Products base table:
UPDATE products SET description = ‘Modified’ WHERE “name” LIKE ‘%Tee Shirt%’; COMMIT
If we look at the contents of the materialized view, we can see that the modifications to the underlying Products table are now reflected in the view:
Materialized views and the transaction log
If we look at the contents of the transaction log for the DEMO database at this point – using the DBTRAN utility – we see the following:
--CONNECT-1010-0000975095-DBA-2011-02-02 11:38 --BEGIN TRANSACTION-1010-0000975106 BEGIN TRANSACTION go --SQL-1010-0000975109 begin set temporary option first_day_of_week = '7'; set temporary option date_order = 'YMD'; set temporary option nearest_century = '50'; set temporary option date_format = 'YYYY-MM-DD'; set temporary option timestamp_format = 'YYYY-MM-DD HH:NN:SS.SSS'; set temporary option time_format = 'HH:NN:SS.SSS'; set temporary option default_timestamp_increment = '1'; set temporary option timestamp_with_time_zone_format = 'YYYY-MM-DD HH:NN:SS.SSS+HH:NN'; create materialized view groupo.shirt_products( prod_id,prod_name,prod_description,prod_size,prod_color,prod_quantity,prod_unit_price ) as select products.id,products.name,products.description,products.size,products.color,products.quantity,products.unitprice from GROUPO.products where products.name like '%shirt%'; set temporary option first_day_of_week = ; set temporary option date_order = ; set temporary option nearest_century = ; set temporary option date_format = ; set temporary option timestamp_format = ; set temporary option time_format = ; set temporary option default_timestamp_increment = ; set temporary option timestamp_with_time_zone_format = ; end go --COMMIT-1010-0000976304 COMMIT WORK go --BEGIN TRANSACTION-1010-0000976307 BEGIN TRANSACTION go --SQL-1010-0000976310 comment to preserve format on view groupo.shirt_products is 'create materialized view groupo.shirt_products( prod_id, prod_name, prod_description, prod_size, prod_color, prod_quantity, prod_unit_price) as select "id", "name", "description", "size", color, quantity, unitprice from products where "name" like ''%shirt%''' go --COMMIT-1010-0000976661 COMMIT WORK go --BEGIN TRANSACTION-1010-0000976664 BEGIN TRANSACTION go --SQL-1010-0000976667 create unique index products on groupo.shirt_products(prod_id asc) go --COMMIT-1010-0000976745 COMMIT WORK go --BEGIN TRANSACTION-1010-0000976748 BEGIN TRANSACTION go --SQL-1010-0000976751 alter materialized view groupo.shirt_products immediate refresh go --COMMIT-1010-0000976826 COMMIT WORK go --CHECKPOINT-0000-0000976829-2011-02-02 11:39 --BEGIN TRANSACTION-1010-0000976859 BEGIN TRANSACTION go --SQL-1010-0000976862 refresh materialized view groupo.shirt_products go --COMMIT-1010-0000976921 COMMIT WORK go --BEGIN TRANSACTION-1010-0000976924 BEGIN TRANSACTION go --UPDATE-1010-0000977130 UPDATE GROUPO.Products SET Description='Modified' WHERE ID=300 go --UPDATE-1010-0000977152 UPDATE GROUPO.Products SET Description='Modified' WHERE ID=301 go --UPDATE-1010-0000977174 UPDATE GROUPO.Products SET Description='Modified' WHERE ID=302 go --COMMIT-1010-0000977494 COMMIT WORK go
There are a few things I’d like to point out regarding the transaction log contents above:
- Note that the
CREATE MATERIALIZED VIEWstatement appears as part of a batch (lines 18-40) that includes statements to re-establish pertinent connection option settings that were in effect at the time the
CREATE MATERIALIZED VIEWstatement was issued.
REFRESH MATERIALIZED VIEWstatement is logged as a separate statement in the transaction log (line 85). By default, the
REFRESH MATERIALIZED VIEWstatement uses
WITH SHARE MODElocking if the
WITHclause is not specified. With immediately-maintained views, only
WITH SHARE MODE,
WITH EXCLUSIVE MODE, and
WITH ISOLATION LEVEL SERIALIZABLEare permitted lock modes.
- Row modifications to the shirt_products materialized view do not appear in the transaction log. Since the view is an immediately-updateable view, it is sufficient for the transaction log to contain only the base table modifications caused by the
UPDATEstatement (lines 94-105). If the log is replayed during recovery, the
UPDATEstatements in the transaction log will automatically invoke the corresponding modifications to the materialized view.
Impact of snapshot isolation
Enabling a SQL Anywhere database for snapshot isolation is done by setting the option allow_snapshot_isolation as follows:
SET OPTION PUBLIC.allow_snapshot_isolation = 'On';
When snapshot isolation is enabled, the default isolation setting for the
REFRESH MATERIALIZED VIEW statement is
WITH ISOLATION LEVEL SNAPSHOT, so that the connection executing the
REFRESH statement is not blocked by other concurrent update transactions that are simultaneously modifying the underlying base tables referenced in the materialized view.
While this is advantageous, there is a tradeoff. The transaction log must contain enough detail about the modifications to the database that we can be confident that applying the log(s) to an arbitrary backup during recovery will re-establish as precisely as we can the contents of the database. With a
REFRESH MATERIALIZED VIEW WITH ISOLATION LEVEL SNAPSHOT statement, the statement will “see” the rows of the underlying base tables according to snapshot semantics – that is, the
REFRESH statement will fail to “see” any modified rows by committed transactions that took place after the
REFRESH statement was started. To properly replay that
REFRESH statement from a transaction log during recovery, the server would need to retain the state of every row of every underlying base table so that the
REFRESH statement can compute the identical contents of the materialized view.
Similar phenomena can occur if the
REFRESH MATERIALIZED VIEW statement is run at any isolation level other than
SHARE MODE, or
EXCLUSIVE MODE. Rather than attempt to provide all of this necessary context in the transaction log, when a weaker isolation level is used with a
REFRESH MATERIALIZED VIEW statement the server logs both the
REFRESH statement and each individual row inserted to the materialized view, rather than log the
REFRESH statement alone.
In this case, if one looks at the SQL statements generated by the transaction log utility DBTRAN, the DBTRAN output will only include the
REFRESH MATERIALIZED VIEW statement, since the
REFRESH statement is all that is necessary when translating a log to SQL to apply it to a different database. On recovery, however, the server ignores the
REFRESH MATERIALIZED VIEW statement, and utilizes the individual
INSERT statements in the log to recover the materialized view contents.
We are considering enhancements to DBTRAN so that the
INSERT statements logged in these cases appear in the output as comments, so that their existence can be verified.