Maddening name changes for the same statistics bet...

Technology Blogs by Members

Explore a vibrant mix of technical expertise, industry insights, and tech buzz in member blogs covering SAP products, technology, and events. Get in the mix!

Just ranting here. In Adaptive Server Enterprise (ASE) version 16.0 SP03, the names of the measurements for index derived statistics are different in the optdiag and sp_showoptstats output, but the documentation only uses to the optdiag names.

In the optdiag output, derived statistics for a non-clustered index look like this (for example):

  Derived statistics:                                                           

     Data page cluster ratio:    0.0000000000000000

     Index page cluster ratio:   0.9988941688270365

     Data row cluster ratio:     0.9995511552728523

     Space utilization:          0.9984938994744493

     Large I/O efficiency:       0.9923186417867956

But in the sp_showoptstats stored proc output has ambiguously renamed the columns like so:

<derivedStats>

  <clusterRatio>0.0000000000000000</clusterRatio>

  <indClusterRatio>0.9988941688270366</indClusterRatio>

  <dataClusterRatio>0.9995511552728524</dataClusterRatio>

  <spaceUtil>0.9984938994744494</spaceUtil>

  <IOEfficiency>0.4961593208933978</IOEfficiency>

</derivedStats>

Here's a table to make mapping optdiag measurement names to sp_showoptstats variable names:

optdiag name	sp_showoptstats name	clustered index all pages locked tables	clustered index data pages/row only locked (essentially a non-clustered index)	non-clustered index
Data page cluster ratio	clusterRatio	valid value	valid value	valid value
Index page cluster ratio	indClusterRatio	zero (optdiag) or null (sp_showoptstats)	valid value	valid value
Data row cluster ratio	dataClusterRatio	zero (optdiag) or null (sp_showoptstats)	valid value	valid value

It's maddening to have the names slightly changed, and in ways that make it really hard to map back to the documentation (the documentation uses the original optdiag names). Also, the changes make the names sound like they conceptually overlap.

It's basically the same problem for all-pages-locked tables with clustered indexes, except that Data page cluster ratio is the only cluster ratio that's filled in (non-zero)

Also the new definition for "Data row cluster ratio" is:

The data row cluster ratio is used to estimate the number of pages that need to be read while using the index to access the data pages.

But they meant to say "the number of data pages that need to be read". Ie., they're not talking about index fragmentation. The old help page for this was clearer. I'm including the old help page definitions, in case that page goes away. Square brackets are my clarifications:

For a table, the data page cluster ratio measures the packing and sequencing [ordering by the index] of pages on extents.

For an index, the data page cluster ratio measures the effectiveness of large I/O for accessing [data pages for] the table using this index.

The index page cluster ratio measures the packing and sequencing of leaf-level index pages on index extents.

The data row cluster ratio, another cluster ratio used by query optimization, is used to cost the number of data pages that need to be accessed during scans using a particular index. It is not used in large I/O costing.

But even so, I'm not clear on what the difference is between Data page cluster ratio and Data row cluster ratio. From the docs:

Data page cluster ratio - The data page cluster ratio is used to compute the effectiveness of large I/O when an index is used to access the data pages

Data row cluster ratio - The data row cluster ratio is used to estimate the number of (data?) pages that need to be read while using the index to access the data pages.

So #2 is really a measure of page fullness? (independent of whether the page are stored on disk in clustered index order)

And so it goes...

(this posting was updated Dec 13 2022 for clarity)