From the Archives: Schema Design Tools are a Tough Challenge
This was originally written by Glenn Paulley and posted to his blog on sybase.com in May of 2008. It is interesting to note that the Hana approach to the problem Glenn talks about in this article is to remove the need to create materialized views at all. By using the pure power/performance of Hana, intermediate results can be generated on the fly without penalty, avoiding the concurrency issues Glenn mentions below.
Materialized views represent an important addition to a DBA’s toolkit. A materialized view is a precomputed view, stored in the database as an ordinary table, and kept up-to-date using either a batch refresh mechanism, or immediately within the same transaction that modifies the view’s underlying base tables. Whether or not immediate modifications to the view can be done depends on the view definition – given an UPDATE, INSERT or DELETE to any particular base table row, the system must be able to compute precisely those rows in the view that are affected by the modification.
The magic behind materialized views is that the query optimizer can substitute one or more materialized views for base tables referenced by an SQL query and do so <em>automatically</em>. View substitution is performed on a cost basis by the query optimizer, a process called view matching. With views that are manually refreshed, the possibility of querying stale data from the view is possible: in such cases, if desired, the relative staleness of the query results can be specified as an option setting or in the query text itself. The option setting to control this in SQL Anywhere is the materialized_view_optimization option.
The problem for a DBA to solve with materialized views is which materialized views to create, a problem that the academic literature calls the view selection problem. In many ways, determining which materialized views to create is similar to deciding which indexes to create. Several of the major commercial systems that support materialized views have included view selection algorithms in their index selection administrative tools. With SQL Anywhere, we have yet to include view selection with the SQL Anywhere Index Consultant, though we have plans to do so in a future release.
What I’d like to illustrate here is a problem with view selection that none of the existing administrative tools consider. Suppose we have transactions A and B that modify tables R and S respectively. It may be, in fact, that A and B do not refer to the same tables at all, and hence there is little or no contention, and no lock contention, between A and B when executing concurrently.
However, suppose the DBA create a materialized view V that joins R and S and is to be modified immediately upon the update of any of its base tables:
CREATE MATERIALIZED VIEW DBA.V ( C1, C2, C3 )
SELECT R.X, R.Y, S.Y
FROM R JOIN S ON ( R.X = S.X );
REFRESH MATERIALIZED VIEW DBA.V WITH ISOLATION LEVEL READ COMMITTED;
ALTER MATERIALIZED VIEW DBA.V IMMEDIATE REFRESH
Let’s again consider the transactions A and B with this newly-created materialized view V. Now, it is possible for A and B to conflict: transaction A can modify a row in R, transaction B can modify a row in S, but if those rows of R and S together constitute a row in V, then one of A or B will <strong>block</strong> on a row write lock when the server attempts to update the view.
This row lock contention on immediate materialized views can be a significant problem in a database with any significant update activity. However, it is extremely difficult to assess the potential concurrency problems due to this phenomena in an administrative tool, because all of the commercially-available materialized view and index selection tools rely on (1) simplifying the workload to eliminate duplicated statements and (2) process and re-optimize each of the statements in a workload multiple times, losing the timing dependencies (and hence any measure of concurrency) that were extant when the workload was originally captured.
Hence the commercial view selection tools are optimistically biased towards view creation because update contention is not taken into account, a situation that can catch the DBA off-guard when the view is put into production without adequate performance analysis beforehand.