Over many years of using ASE as a power user, a lot of that time has been spent on statistics in one form or another. That led me to writing an article for the journal of the International Sybase User Group (as it used to be known) in 2006 – along with a system procedure to help in knowing when statistics for a column were created, and another system procedure for modifying or setting the step count for a column.
After being made redundant by JP Morgan Chase (JPMC) in October 2016, I decided to have some down time using the redundancy money. To keep my hand in with ASE, I started work on bringing Kevin Sherlock’s sp__optdiag up-to-date for use with ASEs 15.5+ almost immediately after stopping work. I then moved on to bringing the two procedures I’d made available in conjunction with the above article up-to-date – extending the first to summarise more information, and the second to also change the tuning factor, sampling percentage and hashing type where appropriate.
Whilst working at JPMC, a goal that was quite often mentioned was to be able to have up-to-date(ish) statistics for tables that contained data for rolling dates – where a certain number of days of data are kept, so that a new date’s data is added and the oldest date’s data is removed each day – without having to update statistics each day after the addition and removal was complete. Needing to stretch my ASE procedure coding muscles further, I embarked on writing a new procedure to do this shuffling of statistics.
When I was writing the original two system procedures, I had seen that ASE is quite resilient to having its statistics data mucked about with. Which was just as well as it allowed me to do some quite horrific things to the statistics data whilst writing the shuffling procedure and the two others I embarked upon after I’d written it. If things went awry, all that had to be done was to delete the statistics for the column or table, update them, and then try not to break them again. Obviously, I never went near an ASE server that anyone else was using during the development of the system procedures.
One aspect of the rolling date scenario is that it requires that the table already has the full set of data, for example for thirty days. That got me to thinking about the scenario where a table is intended to have rolling dates but it isn’t yet fully populated, so that new data is added but nothing is removed; or where the number of days’ worth of data is increased, for example from thirty days to forty days. Which led to me writing a procedure to append statistics for the new date’s data to the statistics for the existing data.
Between the writing of the shuffling and appending system procedures, I embarked on what is probably my most ambitious project concerning statistics – the ability to copy statistics between tables during a schema change process. In those situations it is usual for the existing data to be mostly unchanged in the new version of the table. There might be additional columns with new data, or a column in both the original and new versions of the table might change type (for example from smalldatetime to datetime). But the original data – and hence its statistics – remains largely unchanged. I could simply have written a system procedure using some stringent constraints for successful copying, but I decided that I’d handle as many different types of schema change as I could think of. So I’ve ended up with a procedure that can copy statistics and which can handle columns changing names, position and / or datatypes, partitions changing names, and functional indexes changing their index ID (and hence name). If you can think of any others, please get in contact.
ASE power users may be asking themselves why would a procedure to copy statistics be needed when optdiag is available? The optdiag utility can be used in situations like this, but it needs to be run by someone with sa_role – even to create the statistics file for the original version of the table as the first step. In these days of tighter and tighter security, access to an ASE server by an sa_role user should be kept to a minimum. The maximum level of access needed during a schema change is that of database owner. The other aspect is that some manual (or scripted) editing of the statistics file would probably be required to handle any changes to an existing column along the lines already mentioned. Something to consider is that optdiag doesn’t always work as expected. Prior to ASE 16.0, it cannot handle bigtime, bigdate and bigdatetime (I have tested on Windows in ASE 15.5 EBF 18158 SMP ESD#2 and ASE 15.7 EBF 21338 SMP SP101). Also, it appears that optdiag for ASE 16.0 SP03 PL02 on RHEL 7.4 is broken. for the default bash shell language of en_US.UTF-8 in the LANG variable.
Because the original customising procedure and these three new procedures allow statistics to be manipulated without running “update statistics” (to a greater or lesser extent), I have grouped them under the umbrella of “cheating with statistics in ASE”. The new procedures won’t be a replacement for running “update statistics” but they will allow for a better set of statistics data to exist between statistics updates if it doesn’t happen every day for rolling date tables; and for testing to occur sooner during schema changes.