My experience with materialized views in Sybase ASE
In this post I’d like to focus on precomputed result sets in SAP Sybase ASE, which are also called materialized views (MVs). Basically, an MV is a calculated result set of a query based on one or more tables. Unlike regular views, the result of the materialized view is stored in a database much like a regular table. MVs can be used to (semi)automatic pre-calculation of result sets of relatively costly queries, thus allowing applications to perform much simpler and faster queries from MVs than going to regular tables again and again. Usually, MVs are used for pre-joining and pre-aggregation of the data for data warehouse applications, but it can be useful in OLTP as well, as will be shown later.
The concept of MV is anything but new in the database world; Oracle has implemented it first in version 8.1.5, back in 2000. In ASE, precomputed result sets first appeared in ASE 15.7 ESD#2. For some obscure reason, SAP/Sybase doesn’t advertise this very useful and long awaited feature too much. To my surprise, this feature was not mentioned at all in the last SAP Teched 2013 conference in Las Vegas, at least not in presentations known to me. You are welcome to read the documentation here . I’d like very much to see MVs also in SAP Sybase IQ, but it is a completely different story.
MVs can be queried by applications just like regular tables, therefore we may want to create appropriate indexes on MVs and also update statistics on relevant columns. While it is certainly possible for applications to query MVs directly, the most powerful use of MVs is via automatic query rewriting. “Query rewrite” means that the optimizer is able to automatically substitute parts of queries from base tables with queries from suitable MVs, when applicable. It is a great way to dramatically improve performance of existing applications without touching the application code. When the optimizer “feels like” rewriting of a query (or some parts of the query) using relevant MVs, which possibly contains pre-joined or pre-aggregated data, it will just do that. The result set of the query, of course, remains the same with and without MVs and it is optimizer’s responsibility to care about it.
A change in the data of a table that the MV is based on may cause the data of the MV to become stale (not up-to-date). The MV can be refreshed automatically, as a part of the transaction that updates the base table, or manually. If an MV is refreshed automatically, the data in the MV doesn’t in fact become stale, it always stays in-tact with the base table data, or in other words, fresh. In the case of the manual refresh of an MV, using an explicit REFRESH command, the data in the MV becomes stale immediately after the refresh. It is up to the application to decide using “set materialized_view_optimization” command if stale MVs are acceptable for query rewriting. Sometimes, the data returned by a query doesn’t have to be 100% up-to-date and lagging by a number of minutes/hours, for example, may be bearable.
Now, the background is over and we are ready to move to the actual use case. One of my customers has an application that presents a list of events of some kind, with pagination. In addition, the total number of events of different kinds must be displayed on the screen. New events flow to the system all the time and there is a requirement to present the exact number of events each time. The query executed by the customer’s application to get the count of relevant events looks like this:
select isnull(count(*), 0) from events_big l where ISNULL(l.event_type1, '?') NOT IN ('A','G','P','L') and l.event_is_display = 'Y'
This query is certainly not ideal, but it is next to impossible to change it in a foreseeable future. It performs full table scan of quite a fat table (many tens of columns), because some query arguments are not very SARGable and also because about 80-90% of events answer the conditions of the query. The query performance is OK with tens of thousands of rows in the table. However, becomes considerably slow when the number of rows in the table reaches millions (1.5 million in my test case). Even when the whole table is in cache, the query returns in 1-2 seconds, which is a lot for a frequently executed OLTP query. Reads from the table clearly outnumber writes, so I decided to try to rewrite this query using a materialized view without changing the application. I was quite optimistic about this, but unfortunately I have hit a number of unexpected limitations of MVs in ASE:
1. It appears that MVs with “count(*)” are not allowed, you just cannot create an MV with “count(*)” inside. Instead, MVs should use count_big(*). I must admit that I was unaware of the very existence of count_big(*) before I hit this problem. This limitation virtually eliminates the possibility to utilize MVs for existing applications, that don’t use count_big(*), without changes in the application, which usually implies QA etc., which is in turn not cheap at all and not always possible.
2. Using of ISNULL function in an MV is not allowed.
3. Setting of this set of options is mandatory to create and use MVs:
set ansinull on set arithabort on set arithignore off set string_rtruncation on
These settings are anything but default ones. While it may be implemented quite easily in a login trigger, possible functionality changes in a big legacy application will require a thorough QA.
4. Just for the test purpose, let’s assume that we can work around all limitations mentioned above and change the application. If so, we can use an MV like the shown below to rewrite the query and in this case the automatic query rewrite indeed works:
-- the changed query, which is logically the same as the original one -- and gets rewritten using an MV automatically: select count_big(*) fcount from events_big l where l.event_type1 NOT IN ('A','G','P','L') and l.event_is_display = 'Y' and l.event_type1 IS NOT NULL -- The materialized view definition create precomputed result set prs_events_big_tailored immediate refresh as select count_big(*) fcount from events_big l where l.event_type1 NOT IN ('A','G','P','L') and l.event_is_display = 'Y' and l.event_type1 IS NOT NULL
Using the automatic query rewrite, the number of logical reads in my test case drop from 502317 to 1 and the response time from 1-2 second to milliseconds – which is quite impressive. However, what happens if we want to be more generic and we want to hold a single MV to serve a number of similar queries (an actual requirement in my customer’s case)? I tried to use the following, more generic materialized view:
create precomputed result set prs_events_big_generic unique(l.event_type1, l.event_is_display) immediate refresh as select l.event_type1, l.event_is_display, count_big(*) fcount from events_big l group by l.event_type1, l.event_is_display
Unfortunately, the query from the previous example doesn’t get rewritten and it is a serious limitation. It is not feasible to maintain an MV for each query that should be rewritten. Another important limitation I have found is that it is very difficult to know why the query rewrite didn’t occur, in fact we have nothing but guess here.
Just to be sure that I’m not asking ASE to do something obviously impossible, I performed the same test in Oracle 220.127.116.11 with exactly the same data. It appeared that none of the MV limitations explained above exist in Oracle. Oracle was able to use prs_events_big_generic MV to rewrite the original query; I mean the original query without my fixes. In addition, Oracle supplies DBMS_MVIEW.EXPLAIN_REWRITE procedure, which allows to see if the query rewrite of a specified query using the specified MV is possible, and if it isn’t – shows the reason why. I have found DBMS_MVIEW.EXPLAIN_REWRITE procedure very useful.
My conclusions: SAP has finally implemented precomputed result sets or materialized views in ASE and it is very good. Unfortunately, this new great feature still suffers from some significant limitations, which are not always properly documented. If you have experience with MVs in databases from other vendors and plan to implement MVs in ASE – perform thorough functionality tests first, you may hit additional limitations that I didn’t find during my research.
P.S. All my tests with materialized views are fully reproducible. I can send my data set to anyone who wishes to repeat my tests.
This post has been originally published at http://leonid-gvirtz.typepad.com/blog/2013/11/my-experience-with-materialized-views-in-sybase-ase.html
What version of ASE did you experimented with "precomputed result set" or "materialized view"? Can you send me your repro data set to take a look when I have a moment?
I performed my tests on ASE 15.7 SP102 on Linux x64.
I will try my best to prepare the full test case and send it to you in next 1-2 days. The data set size after compression is 33.6M, I will find a way to publish it.
I have prepared the detailed test case, see attached to the blog post. The test data set can be downloaded at https://www.dropbox.com/s/a5zkelk1qqgkfn9/events_big.bcp.bz2. Let me know if you have any questions.
It is being downloaded, thanks. When I have time, I shall take a look.