From The Archives: The DBA Will Be Obsolete
This post was originally written by Glenn Paulley and posted to his blog on Sybase.com in April of 2008. While old, it highlights the focus of the SQL Anywhere engineering team, which has not changed, and is more relevant than ever to anyone building applications that need to run outside the traditional data center.
Now that I have your attention…..
The holy grail, to use a cliche, of self-managing, self-tuning systems (termed autonomic computing by IBM) is completely independent, self-managing operation: no performance tuning, system configuration, or manual intervention is required; the system runs itself.
Self-managing database system technology is the topic of a recently-created IEEE Workgroup on Self-managing Database Systems (IEEE Workgroup on Self-managing Database Systems), which now has had three workshops held in conjunction with the annual IEEE Data Engineering conference, and is about to hold a fourth workshop this week at this year’s Data Engineering conference in Cancun, Mexico.
Any system administrator or DBA knows that this goal of zero-administration has yet to be realized in any product available today, though some products, such as SQL Anywhere, come closer to this ideal than others. In database systems, the state-of-the-art today can be characterized by two classes of “self-managing” initiatives:
a) The first class of initiatives is the development of (sometimes sophisticated) tools to help the database administrator configure the database server’s operational environment, including specification of a wide range of tuning parameters. Physical database design tools, such as index and materialized view selection tools offered by most of the major commercial DBMS systems (SQL Anywhere’s Index Consultant, IBM’s DB2 Index Advisor, Microsoft’s Autoadmin tools) are good examples of these kinds of tools. Physical database design continues to be an area where the complexities of automatic index or materialized view creation (and alteration or deletion) are such that doing so automatically with guaranteed results is significantly hard. However, tools such as the ones listed above can assist the DBA in making good physical design decisions. Solving the index selection problem optimially is itself NP-hard, but the tools use approximations that still provide useful results.
b) The second class of initiatives is the development of technologies, embedded within the database server itself, that do in fact result in self-managing operation. Examples of this technology include dynamic cache sizing and automatic statistics gathering and maintenance. Both of these were pioneered by Sybase iAnywhere in the SQL Anywhere product, the former with Version 7 and the latter with Version 3.2 in 1992. Other technologies include automatic, dynamic working memory allocation, dynamic multiprogramming level adjustment, automatic task prioritization, dynamic selection of alternative query processing strategies, automatic I/O cost model adjustment, and improvements to histogram maintenance to both detect and correct histogram inaccuracies found at either optimization time, or at execution time.
Once perfected, self-managing technologies such as the ones listed above will change DBA tasklists to eliminate tedious, workload-dependent performance tuning. As an example: does it really make sense for the DBA to specify the number of buckets for a column histogram in some commercial DBMS? Shouldn’t the server automatically grow or shrink the number of buckets based on the actual distribution of values stored in the database over time? Or as another example: why is it necessary in some systems that the DBA specify the (fixed) size of the buffer pool devoted to sorting? What happens if the workload changes from hour to hour? Shouldn’t the server decide on sorting requirements based on the current operating workload?
But just to be clear: what I am not saying is that the performance of a self-managed system will be equal, at all times, to a system that is hand-tuned by an expert, with or without the help of configuration tools such as the recently-released DB2 buffer pool configuration tool. The points I’m trying to make are:
(a) by and large, configuration tools are useful only with a known, constant workload. In dynamic environments where the workload changes over time, it makes sense for the database system to figure out what needs to be done, at run time. For a wide variety of scenarios, SQL Anywhere can do exactly that, today, without manual intervention. This makes SQL Anywhere ideal for limited-administration environments where there may not be a DBA to perform “care and feeding” of the system.
(b) self-management is a holistic problem: self-management technologies are fundamental components of the server, not merely administrative add-ons that assist a database administrator in configuring the server’s operation. In particular, it is important that these technologies work in concert to offer the level of self-management and adaptiveness that embedded application software requires. It is, in our view, impossible to achieve effective self-management by considering these technologies in isolation.
Self-management is a constant design criteria when my Engineering team develops addtional features in SQL Anywhere. We continue to develop additional self-management techniques and include them with the product. But because perfection in self-management has yet to be attained, we (still) have to provide performance diagnostic tools such as graphical query execution plans and application profiling.
You can read more about the self-management technologies in SQL Anywhere by reading this SQL Anywhere: A Holistic Approach to Database Self-managementIEEE SMDB Workshop paper from 2007.