Hello database experts,

more and more people are interested in Sybase ASE as new database solution for their SAP systems in cause of unknown future of MaxDB and the high prices of databases and features of RDBMS like Oracle and DB2.

But how performant is Sybase ASE?

How big is the migration effort?

How good is the compression?

How big is the administration effort?

I will clarify some aspects and bring a little bit light into the darkness or more or less bad documented areas.

Migration

Prerequisites

Test scenario:


Source System NW731 SPS10 BW (distributed)

MaxDB 7.9

Windows 2012

Physical Size 2TB

Logical Size 1.7TB

40GB RAM

8CPU

Target System NW731 SPS10 BW (distributed)

Sybase ASE SP110

Windows 2012

40GB RAM

8CPU

Export/Import

  1. At first all notes and new kernel/migration tools have to be updated on the source system to run a clean export
  2. Second step is to run the SMIGR_CREATE_DDL to create the needed SQL files for the BW tables
  3. check dictionary and other steps described in the system copy guide
  4. Run the export with migration optimized settings (depends on each source database type)
  5. Create the target DB (setup parameters, configuration of CPU, caches, pools and update to SP110 => Best Practice note 1680803 )
  6. Start the import manually! That is important to control and setup the config and may be seperated index creation with own parameter setup
  7. standard postprocessing described in the system copy guide

Issues during Sybase migration

1. R3load issue trunc log issue


[ASE Error SQL1105]Can't allocate space for object 'syslogs' in database '<SID>' because 'logsegment' segment is full/has no free extents.


Solution


It is mandatory to use at least the following component versions to run a migration:

– ODBC 15.7.0.508

– DBSL 720_EXT UC 64, pl 431

– R3load 720_EXT UC 64, pl 402

Note: log segment must be big enough to handle the I/O that the checkpoint can be triggered in this time

=> increase size of the log segment or decrease parallel degree (=import jobs)

2. Partitioning issue


number of partitions > 8000 partitions, no official statement how many partitions are supported! My tests results show a working number of 1500 partitions.

Report RS_BW_PRE_MIGRATION: A buggy display of size => that are only 6GB not 6TB!

So we have 2 options because if I import this table with 8172 partitions, because I get an error while import the data:

(SYB_IMP) ERROR: DbSlPrepare/BegModify  

failed rc = 99, table "/BIC/FZSDM1CS3"

SQL error 30046)

error message returned by DbSl:

[ASE Error SQL21][SAP][ASE ODBC Driver][Adaptive Server Enterprise]WARNING - Fatal Error 702 occurred


Solution


Option 1

Collapse the cube to at least 1500 requests (=> 1 REQ = 1 Partition)


Option 2

Create an entry on the source system in table rsadmin (1691300 – SYB: Unpartitioned F fact tables for InfoCubes) with :

Object Value
SYBASE_UNPARTITIONED_F_FACT_01 ZSDM1CS3

Note:

=> Table must be exported again with new run of SMIGR_CREATE_DDL, because the new table creation statement without partitioning must be written to the sql files.

=> so please analyze this before you start the export! Collapse or export with rsadmin option!

3. Index creation issue (sorted creation)

Index creation fails with:

SQL error 1530: Create index with sorted_data was aborted because of a row out of order.

If index creation fails due to data being out of order, the DDL template file mapping is wrong for the respective package.

Solution

Map the failing package to DDLSYB_LRG.TPL to omit the sorted_data option for the creation of the SAP primary keys. If the R3load task files have already been generated, modify the corresponding command file (<package name>__DT.cmd).

Sybase documentation to this option:

The sorted_data option can reduce the time needed to create an index by skipping the sort step and by eliminating the need to copy the data rows to new pages in certain cases. The speed increase becomes significant on large tables and increases to several times faster in tables larger than 1GB.

If sorted_data is specified, but data is not in sorted order, Adaptive Server displays an error message, and the command fails.

Creating a nonunique, nonclustered index succeeds, unless there are rows with duplicate keys. If there are rows with duplicate keys, Adaptive Server displays an error message, and the command fails.

Tips for migration

Increase migration performance with seperated data import and index creation, with different DB settings. All settings therefore are described in the Best Practice migration guide.

Summary

Compression

Compression ratio like MSSQL (no wonder because MSSQL based on Sybase source code) but not as good as e.g. Oracle 11g compression for OLTP, but this is only my impression, because I have no 1:1 example on Oracle.

In Oracle we have normally a block size of 8k and there the block compression takes place. An index compression and a secure file (LOB files) compression is also included.

With Sybase we have a row compression to compress away empty spaces/zeroes in fixed length columns.

Both page dictionary and page index compression strategies are used at the page/block level and last but not least the LOB compression for large objects.

This all happens with a SAP standard block/page size of 16k.

Compared to MaxDB this is a quantum jump not only the fact that you save disk space, you also increase the efficiency of your data cache.

Performance

Here are a bit more tests required to take a significant statement. SAP tests results in a performance boost of 35-40% compared to MaxDB.

Measurement MaxDB Sybase
Startup time of DB 12min <30sec
Data compression

no – still 1700GB

770GB
Backup Compression 176GB 219GB (only SID DB, not master or other)
Partitioning no yes (up to about 1500 partitions)
GUI Administration yes (Database Studio / DBACOCKPIT) yes (SCC / DBACOCKPIT)
Shadow DB solution yes yes
auto. configuration check yes yes (depends on the SAP release*)
in memory able no yes (currently not supported by SAP BS)
db verify yes yes, with backup or dbcc
shrinking data files yes yes (note 1881347 since SP100)
huge pages no yes ( note 1805750 and this blog)

*The configuration requirements and recommendations specified in this SAP Note can be compared with your configuration of an SAP ASE database with the DBA Cockpit. This feature is available starting with the SAP_BASIS support package stacks:

7.02 SP17
7.30 SP13
7.31 SP14
7.40 SP09

1581695 – SYB: Configuration Guide for SAP ASE 16.0

1749935 – SYB: Configuration Guide for SAP ASE 15.7

Migration effort

Not more effort compared to another DB. You just have to read the migration Best Practice of SAP and notice the mentioned known errors.

It is definitively in cause of the compression and partitioning features a lot of faster than MaxDB and a good alternative for all other more expensive DBs. It fits into the concept of SAP for the next years.

Not all functions and features are as good documented/integrated as wished for the customers but SAP keeps going on to improve this things.

Administration effort

The integration of the tools in dbacockpit are pretty good, but such a nice colorful and gladly clickable interface like the database studio is not integrated in Sybase ASE. The people who already familiar with the commandline based administration like Oracle or DB2 are learning fast the new commands/stored procedures. The procedures are nearly the same like MSSQL – no surprise or?

A automated configuration check or script for the whole configuration would be really helpful. I have written my own scripts because I don’t want waste time to copy paste over 100 parameters for each DB which I’m going to install.

It is also easy to update the Sybase ASE with a GUI wizzard, but to set all the additionally parameter for performance which are not described in the configuration note, you will need some time to size the perfect values.

I hope I could show you some new interesting facts of Sybase ASE. It is a definitively a good alternative to MaxDB and must now accept the challenge against Oracle, DB2, MSSQL etc. SAP has still some work to do to fully integrate all functions.

If you have any further questions, don’t hestate to comment the blog or contact me or one of my colleagues at Q-Partners ( info_at_qpcm_dot_de )

Best Regards,

Jens Gleichmann

Technology Consultant at Q-Partners (www.qpcm.eu)

######

Edit:

added configuration check

1581695 – SYB: Configuration Guide for SAP ASE 16.0

1749935 – SYB: Configuration Guide for SAP ASE 15.7

######

To report this post you need to login first.

15 Comments

You must be Logged on to comment or reply to a post.

  1. Jeff Tallman

    I think the real value in DBACockpit comes once SAP DBA’s start having to look at performance.   In my experience with SAP, nearly all the peformance problems were due to either 1) too small of a statement cache; 2) insufficient indexing on SAP tables.  #1 is easily solved and DBACockpit does have a graph to show this – but I prefer my own custom graph.   #2 is where DBACockpit excels – but you need to be a DBA who understands query performance tuning and indexing in general vs. a operational DBA more concerned with nightly backups.   I have found the most common performance issues with SAP apps is that the pre-defined indexes are insufficient for the way users are using the application – and as a result, a ton of CPU time and IOs are done unnecessarily.   To resolve, in DBACockpit, simply look at the cached statements and focus on the ones with high CPU time or high TotalLIO – look at the query and then look at the indexing on the table.  Most times you will find that there isn’t a very good index….if there is one that covers the SARGs, it will include a few other columns in-between that are unknown and very distinct such as TIMESTAMP, etc….which often makes the index unusable.   But I think it is DBACockpit’s ability to show the most expensive statements easily that is one of its strong points.

    (0) 
    1. Jens Gleichmann Post author

      Thanks for your comment Jeff!

      I think for performance tuning it is a good point to start with the top consumers and the hit ratios of the caches, but I think there is a big gap to for example the dbacockpit for Oracle with its integrated table/index fragmentation and ASH/AWR scenarios.

      If you want to see the CPU or I/O peaks I prefer to look into the graphs of the SCC (Sybase Control Center).

      SAP also should implement tools like sp_sysmon for session sampling.

      For the most DBA’s this tool is sufficient! So I’m totally with you in the cases which you have described, but there are a lot of other scenarios which currently can’t be completely covered by the dbacockpit and that’s the point where the work of SAP sybase team starts 😉

      May be they should also implement something like a cache advice tool.

      Regards,

      Jens

      (0) 
      1. Jeff Tallman

        Actually, that is the problem.  The way SAP works, the cache hit ratios are often artificially distorted extremely high on bad queries.  For example, I often see table scans (or partial scans including index leaf scans) in memory – and tons of them – all driving supposedly phenomenal 99%-100% cache hit ratios.     Hence my push for DBA’s to focus on the top N bad queries and overall CPU usage vs. data cache hit ratios.   Even for statement cache, I like to see that the number of statements removed is <10% of the total statements vs. looking at the cache hit ratio…..although there, it is a much better indicator than generic data cache hit ratios. 

        Graphing CPU is nice….the early versions of DBACockpit didn’t have graphs….not sure if the current one does or not – my last cust was using DBACockpit v1.0 (eeeyiyyyeyiiiiyiii)….however, rather than graphing IO rate as SCC does, I would prefer a graph of IO response times (with separate lines for read vs. write speed).

        Table fragmentation is in the ATM jobs under reorgs (or at least the job to reorg when index cluster ratio drops by 10%) – and personally I don’t think it has the impact that many focus on it for.  The table got fragmented somehow – if via deletes – then likely the deletes will continue and constantly reorg’ing the table to eliminate that fragmentation is just silly as you are constantly fighting a losing battle – best to instead use partitions and truncate the partition if fragmentation driven by large deletes.  But then you were approaching this from a BW standpoint and almost all of my experience has been with ERP.  For ERP, the bigger issue related to fragmentation is forwarded rows in ASE – and that is an area that DBACockpit doesn’t display (AFAIK…could be wrong – the place I used DBCockpit the most used the version shipped with sp42 which I think was v16….current version is something like 23)…..and this is where you need to (perhaps) go manual/command line and look at setting exp_row_size and inrow lob size) before you reorg the table.  

        …and sorry – haven’t seen sp_sysmon help with SAP performance problems yet.

        My biggest complaint about DBACockpit is there are no classes taught for it – users are just given it and told “go forth and be successful” ….which rarely happens.

        Yes, a cache advisor would be handy – that is one area (along with setting up separate engine groups/thread pools for Batch vs. Dialog) I do spend some time.   Invariably, I find some goofy sceduled task hitting some whopping table flushing more critical data out of cache and I end up putting it in a restriction cache – as well as binding a bunch of small constantly hit tables to a reference cache.   The problem with this is the techniques for setting up caches is completely orthogonal to decades of ingrained DBA behavior, so it is often a bit of a discussion…..ditto on the thread pools for BATCH…..

        (0) 
  2. Marc Geller

    Hi Jens,

    Just a comment : it’s true many things are similar with MSSQL , but that can be explained by the fact MSSQL was , as its beginning , when I was working with Sybase 4.9 , sybase sql server 4.2 code for OS2 , and after windows platform. Of course MSSQL evolved differently but some basics still are the same.

    It was Sybase which had to change its name — toward adaptive server — not to be confused with MSSQL.

    Regards

    Marc

    (0) 
  3. Mike Willett

    After benchmarking Compression we found the performance of SAN reads did improve, We’d see about 20% improvement in real world tests – but we’d also see a drop in performance of cache reads which reduced our interest in the option.

    And when you look at the actual costs it was cheaper to buy a new machine with 512Gb of RAM to keep more data in cache – which was our final choice.

    Shouldn’t ASE have the option to keep the data in cache uncompressed so there is no down side to cached reads.

    (0) 
    1. Jeff Tallman

      Keep in mind that the most common form of compression used by ALL database vendors is ‘dictionary’ compression – in which values are substituted by low byte length tokens.  So….for example, a data page containing less than 255 distinct values for a given column could have the column stored as a single byte.  Unfortunately, that column might be something like LastName or City or ProductName – e.g. something we normally associate with medium width varchars.    If we uncompressed the values back when reading into memory, it is unlikely it would fit on the pagesize in cache – result would be  logical page split – or row forwarding – take your pick – both have very nasty side effects.

      My advice is pretty simple – don’t compress anything ~<50MB in size – and don’t compress objects which are constantly in cache and don’t have effective indexing strategies.   The cost of compression is largely CPU (including spinlock if compression pool size isn’t sized correctly)…..and if the table is in cache all the time (or fairly small) or you simply can’t live with a reasonable index and you have to endure constant high LIO on a cached table….then, possibly uncompress the table.    As the LIO decreases, so does CPU…..so, if you can put better indexes in place, it might be the cheapest option.   Of course, an extra index (or fixing the ones already there) could increase the footprint – but I would argue it almost always decreases response time and CPU consumption enormously and is worth the price.

      (0) 
      1. Mike Willett

        > My advice is pretty simple – don’t compress anything ~<50MB in size


        Agreed – we only benchmarked compressing tables over 50Gb in size.


        I don’t see how this comment works in a real world situation


        > don’t compress objects which are constantly in cache


        Assuming that your database is larger than your available RAM which I’d guess in most places. Then tables are not “in cache” or “out of cache”. Objects have some pages in cache and some page not in cache. (With the exception of some small very hot tables).


        Here’s a real live case

        eg consider a customer order table – 200m rows – some customers will be reading their rows frequently and will be held in cache and some won’t.  The option is therefore to compress all the customers and slow down the frequent customers or not to compress and have slower SAN and faster cache. We’ve looked at partitioning and compression on some partitions but that was an awful solution.


        Also consider historic data – we have 2Tb of it – we partition the data each year and would like to compress old years – you can’t easily compress the data . You need to update it and at over 200m rows per year – its a long process.


        >  If we uncompressed the values back when reading into memory, it is unlikely it would fit on the

        > pagesize in cache – result would be  logical page split – or row forwarding – take your pick – both

        > have very nasty side effects.


        Well I can appreciate how difficult it would be achieve this – but as it stands I don’t think the advantages with the compression are there yet. And when you factor in the costs – we decided against. Maybe with the type and amount of data we store compression isn’t aimed at customers like us.


        > but I would argue it almost always decreases response time and CPU consumption enormously and is worth the price.


        I would agree that indexes are sometimes worth the extra footprint. However, you need to appreciate the total cost of ownership of extra indexes. If we add one extra indexes to 10% of our data then our footprint grows by about  0.5Tb. Each insert into an extra index will cost about 8ms per page to read from SAN (if its not in SAN). Dumping an extra 0.5Tb will mean more time to dump and load.

        (0) 
        1. Jeff Tallman

          Unfortunately, applying such generalizations often leads to such conclusions.   Based on experience with a large number of customer systems, I will simply state that:

          1 – you would be surprised at the number of large tables that are mostly in cache….but you missed the full conditions – large tables mainly in cache AND do not have an effective index strategy.  Implication – frequent table scans – or a gazillion data page LIO’s based on index scans.     Compressing these tables is not necessarily useful….unless you are trying to constrain the amount of cache consumed.   However, take your customer order example – since SAP systems are generally heap tables (no clustered indexes) but do strive to have an effective index strategy – all of the more recent orders will be at the tail-end of the table and likely cached while older orders are more towards the top of the table and not in cache.  Customer lookups on recent orders will (or should) only affect a few rows of data per query and therefore the overhead of compression is minimal – especially if the index keys are not compressed and the materialized columns are index keys.  More importantly, after a short period of time (e.g. 60-90 days for most), the order record is typically static and not likely modified ever again….but perhaps hit for reports – e.g. End of Quarter/End of FY reporting – which likely will do more PIO….and that is where compression helps as the PIO cost might be 30-40% less….albeit the CPU cost is likely 2x higher……but overall elapsed time is less.

          2 – I can’t comment on your issues with partitioning and compression – I do know that it works – and you don’t have to update the records (a reorg would have compressed them – and reorg-ing a partition is an order of magnitude or 3 faster than the table – but then one has to use a decent number of partitions – e.g. anything less than 30 is just kidding yourself – whereas a sweet spot of 50-100 partitions does wonders (range partitioning) – especially if non-unique indexes are local vs. global).

          3 – Adding an index to 10% of the data is just the type of generalization that is misleading.   Commonly, you have – perhaps – 10 or so key transaction tables.   Of those 10, quite often adding an index to a few of them (e.g. 3-4) is all that is necessary based on the query profiles.   For the typical SAP ERP system with 60,000 tables or so, commonly, I have only had to add 5-6 indexes across the entire schema to get performance where desired.  Yes, adding an index *does* add overhead to inserts (and some updates) – but if a query is using it, then it will be in cache and the expense of reading it from SAN – and speaking of just reading the portions necessary for the query – is negligible compared to the alternative.   Since most writes are cached (except the log), then modifying the extra index is mainly an overhead in logical page modifications – and predominantly concurrency related issues.  In addition, on those large tables, I generally find 2 or more indexes that are never used….    What costs more – adding an index that would be used – or supporting 2+ indexes that are never used??  A dumb question…..but…..reality….but also unfortunately one that often is answered wrong as the 2 unused indexes are kept due to fear of the unknown of removing them and the new useful index is disallowed due to fears over extra IO.    One also often needs to put things in a business perspective (and they are paying the bills)….e.g. if I go to the business and state “I can speed up that report by running it in 5 minutes instead of 10 hours” ….or “increase the capacity of the system to support 100,000 sales per day instead of only 10,000 – but it will cost an extra $$ for the increase in disk space and DBA time”…..most of the time that extra 0.5TB of backup space becomes a non-issue.

          (0) 
          1. Mike Willett

            Thanks for the reply. Answering a few of your points….

            > but you missed the full conditions – large tables mainly in cache AND do not have an effective index strategy.

            So you’re saying — “only compress tables which aren’t mainly in cache but do have an effective index strategy”.

            I’d say all our tables have a effective index strategy (if they didn’t they’d take hours to scan)

            Our key tables are read off SAN and then repeated reported on many, many times. So the first read is very slow but

            subsequent reads are fast. From my tests I get 20% gain of PIO using compression but 20% cost of LIO, so in our

            case slowing down the LIO to speed up the PIO isn’t really much of a gain.

            Of course difference systems will work in different ways and will experience difference performance gains and loses.

            > However, take your customer order example – since SAP systems are generally heap tables (no clustered indexes) but do

            > strive to have an effective index strategy – all of the more recent orders will be at the tail-end of the table and

            > likely cached while older orders are more towards the top of the table and not in cache.

            > Customer lookups on recent orders will (or should) only affect a few rows of data per query and therefore the overhead of

            > compression is minimal – especially if the index keys are not compressed and the materialized columns are index keys.

            Unfortunately, thats not how our system works – we’re essentially allowing various teams to do analysis across all the 200m

            of customer orders – generally producing a few thousand reports each day. Its not an order processing system – more of an

            analysis system. Some might argue that IQ might be more appropriate – but an IQ migration would be out of the question

            and considering that some of the system is more OLTP we’d probably lose as much as we gained (which is what we found when

            we worked with Sybase team to port the last system I worked on to IQ)

            > One also often needs to put things in a business perspective (and they are paying the bills)….e.g. if I go to the business

            > and state “I can speed up that report by running it in 5 minutes instead of 10 hours” ….or “increase the capacity of the

            > system to support 100,000 sales per day instead of only 10,000 – but it will cost an extra $$ for the increase in disk space

            > and DBA time”…..most of the time that extra 0.5TB of backup space becomes a non-issue.

            Agreed. But compressing will not make reports go from 10hours to 5 mins. It gives slightly faster SAN reads and slightly slower cache

            reads. And when you consider the cost of ASE compression is about 20x the price of a new machine which quadruples the CPU performance

            and quadruples the cache size – it was easy to decide what to do.

            (0) 
            1. Jeff Tallman

              First of all – I am not sure if your intent here is to complain about pricing – in which case it is the wrong place – or if generally just trying to argue …

              However, please don’t misconstrue what I have said.  For example, I *never* stated that adding compression make a report go from 10 hours to 5 minutes – that point was about indexing – which you then jumped on with generalizations about why proper indexing was impractical.     Also, table scans – while a common side effect of improper indexing  – is not the only impact/indication of improper indexing.    A much, much, much more common effect is leaf scanning index pages of not-so-great indexes and then doing a data page fetch for every leaf node.   This is much harder to spot than with table scans and is most often missed as many DBA’s simply see that the query is using an index and don’t check to see how well the index really meets the criteria.

              Don’t also assume that negating a comment is always correct either.   If I say “Don’t eat poisoned apples”, stating “you are saying I should eat only unpoisoned apples” is not necessarily true….we all know that a balanced diet is best vs. just eating apples….so twisting my statements wrt to caching and indexing doesn’t mean you have the correct interpretation.

              My points:

              1) Don’t bother compressing small tables nor tables that are frequently (mostly) in cache (with today’s large memory systems, the size of tables that can be cached sometimes is definitely in the large range).

              2) Compression could have significant negative effect if proper indexing is not available.

              3) The negative effect of compression significantly amplifies performance degradation as a result of improper indexing which in and of itself is a different issue that should be resolved.

              4) The cost of adding an index is not as drastic as it sounds – AND – if there is concern about the overhead of index maintenance, in most cases, there are most commonly bigger existing issues with even more unused indexes than the overhead of adding  a new index.   If there really were concerns, those unused indexes wouldn’t exist.

              It sounds like your system is more like point #1….in which case data compression may not be an option for at least the current data.    WRT your experience with partitioning, etc.  – again, I can only state that I have been to many places where it worked.   It didn’t for you – that is your experience…..  which differs from mine.   Doesn’t mean one of us is wrong….just means we have had different experiences. 

              We also all need to be aware that most features/options have trade-offs…..sometimes the trade is very advantageous……and other times the trade simply isn’t worth it.   Doesn’t mean the option is worthless nor does it mean that option has a bad design. You asked a legitimate question to which I explained why dictionary compression has an overhead – whether IBM, Oracle or ASE – doesn’t matter – and why it is somewhat impractical (without an even bigger memory hit) to uncompress data in memory.   Having understood that, one can then make better judgment about whether compression will work in their situation.

              (0) 
              1. Mike Willett

                > However, please don’t misconstrue what I have said.  For example, I *never* stated that adding

                > compression make a report go from 10 hours to 5 minutes – that point was about indexing


                Apologies – my mistake


                > “twisting my statements wrt to caching and indexing doesn’t mean you have the correct interpretation.”


                Apologies again – No deliberate attempt to do that


                > Doesn’t mean one of us is wrong….just means we have had different experiences.


                Agreed – I thought I said that with “Of course difference systems will work in different ways and will experience difference performance gains and loses.”


                > Doesn’t mean the option is worthless nor does it mean that option has a bad design.


                I don’t think compression is worthless – in my tests it gives a 20% gain in Physical IO performance and in the world of over-contended SANs that’s a worthwhile gain.

                I merely would like the option not to compress once in the cache – but I do appreciate that this would be much harder to implement (and possibly create other overheads)


                The area I’m having problems understanding is when you say

                2) Compression could have significant negative effect if proper indexing is not available.

                I don’t want to twist your words so correct me if I’m wrong, but I think you’re saying that if you have the correct indexes, compression won’t have a negative effect ?

                We found selecting data from the cache was 20% slower with compression turned on – hence all our reports were 20% slower (assuming all data was in the cache). (ie A 20s query would take 24s). Perhaps my benchmarking wasn’t right but the Sybase person I spoke to in Switzerland said these stats were about right. If they’re wrong I could retry them.


                I think this case if what you’re implying with

                “A much, much, much more common effect is leaf scanning index pages of not-so-great indexes and then doing a data page fetch for every leaf node.”


                How do we get around this ? I can’t create a covered index across the whole table to prevent a leaf node read. We have a table of 200m rows and reports will pull back many different slices of this data for analysis.


                (0) 
                1. Jeff Tallman

                  Actually, 20% is believable – but I wouldn’t assume this is a standard/consistent in all cases….  Keep in mind, that it is the number of rows and the number of columns needing to be uncompressed that have an effect…..this was made a bit better in 16.0 – forget the CR number – in which only the columns necessary for query processing were uncompressed vs. all the columns in the row being read.

                  However, what I was referring to is worse than you think.   For example, if you do a in-memory tablescan with compression, it could be up to 2.5x (250%) slower than an in-memory tablescan without compression – which is much worse than 20%…..and that is what I was referring to.  

                  In your case with repeated LIO’s against the same pages, it likely doesn’t help – but consider an annual report where the data is likely read from disk once (for that report) and then not re-read a lot…..the fact I can get more rows per PIO and reduce the PIO times, the better. 

                  The next question is whether the amount of data that can be retained in cache makes it worth while.   For example, assume I have a table of 1.3 million rows, and I can hold 1GB of data in cache.   Now, assume that in that 1GB I have a choice of either 1 million uncompressed rows – plus physical reads for the remaining 300K rows …..or whether all 1.3 million rows fit when using compression.   Yes, the (in your case) 20% overhead is there – but does it overcome the PIO cost of doing 30% PIO’s???  Likely.   However, can’t I just increase memory by 30% (as you did) and get the same effect.   Absolutely – as you have shown.   That assumes that a) you can increase the memory in the box; or b) you can buy new boxes with more memory but same number of cpu cores (to avoid additional licensing aspects).   If (a) or (b)….then compression may not be cost effective and have too many negative impacts.    However, for example you are running on Linux and you have an app that runs on the same box as ASE (and it must run there for whatever reason) and it can not run with huge pages (or due to OS bugs with hugepages – which there have been some in the past – fixed now), then you are limited to 256GB of memory for ASE due to OS constraints when hugepages are not in use.   Then getting that extra 30% of data into cache thanks to compression is worth something.

                  As far as the indexing – what I am referring to is not that all index leaf scans are bad….but if I have an index on {a,b,c,d} and my search arguments are {a,b,e,f}  – then lacking an index on {a,b,e,f}, I will pick the index on {a,b,c,d}, but simply do a leaf scan of all the leaf pages on {a,b}.    This is worse with compression as each node will have to be a datapage fetch and a decompression of the values for {e,f} minimally as well as columns being materialized….for a lot of rows/columns (see tablescan example earlier).  An index on {a,b,e,f} as we both know would be a lot better – yes – I may still to a leaf scan (assuming non-unique index) – but for a much more limited set.   This drastically reduces the amount of data being uncompressed.   In addition, since {e,f} are index keys, I don’t need to uncompress those values (assuming uncompressed index), so I only need to uncompress columns being materialized.

                  Believe it or not – I have seen a TON of systems with that situation in which the query on {a,b,e,f} was actually run many many times per minute…..and no one wanted to add an index for the exact arguments you made – and assumption that the index on {a,b,c,d} was adequate (wrong!).

                  We also need to keep in mind that compression never is really about performance – it is about TCO – reducing storage costs specifically……and whenever you try to do something cheaply, often performance takes a hit somewhere.   However, you can minimize this to something more acceptable with appropriate indexing and by not compressing tables that shouldn’t be – often without impacting the overall TCO aspect measurably.  In addition, the LIO costs are sometimes offset by the PIO gains.  In your case not…..   But then when you add in dev, test, QA & etc (I have been at places where there were 10+ copies for dev)….then compression might pay off again as the aggregated storage costs becomes more burdensome than 20% slower reports.

                  (0) 
                  1. Mike Willett

                    Thanks – I agree with your comments.

                    > For example, if you do a in-memory tablescan with compression, it could be up to 2.5x (250%) slower than an in-memory tablescan without compression – which is much worse than 20%…..and that is what I was referring to.

                    Wow! I hadn’t experienced this in my tests – but then we don’t do much table scans and obviously never do it on large tables.

                    > Believe it or not – I have seen a TON of systems with that situation in which the query on {a,b,e,f} was actually run many many times per minute…..and no one wanted to add an index for the exact arguments you made – and assumption that the index on {a,b,c,d} was adequate (wrong!).

                    I believe it – I’ve spent 15 years fixing these sort of issues on various systems. Its amazing how people mis-understand indexing. I think my understanding ok with this – I’ve had my indexes checked by Sybase performance team in the past.

                    > Then getting that extra 30% of data into cache thanks to compression is worth something.

                    Absolutely – Definitely worth it.

                    > We also need to keep in mind that compression never is really about performance – it is about TCO – reducing storage costs specifically


                    Agreed. When I benchmarked compression I also calculated the reduction is storage costs.


                    However (and I know you don’t want me to complain about Sybase prices but…)  compression is charged per cpu-thread, So on a 48 cpu-thread machine the costs are about much, much higher compared to buying a new machine and reduced SAN costs.


                    Have you thought of other licensing models ?

                    (0) 
                    1. Jeff Tallman

                      ASE and options are not priced on cpu THREAD – it does have cpu CORE pricing – but a CORE can have 2, 4 or 8 threads (depending on CPU vendor).   I also think the pricing is based on a core factor….but I am not an expert in this area as it is more a Solution Management function.   Best bet to get the straight story would be to shoot an email to  Rudi Leibbrandt <rudi.leibbrandt@sap.com> if you are really curious.   There also are differences between the old SY pricing and SAP prices – one of which (I believe) is that SAP doesn’t charge for Dev/Test licenses – but check with Rudi to be sure.

                      (0) 

Leave a Reply