Skip to Content

One of the most important considerations when writing a select statement against a large table is the effective use of an index. However this is sometimes more easily said than done. Have you ever found that your WHERE clause is missing just one field of an index and that field is not at the end of the index?

There are some situations where you can make more effective use of the entire index even if you are missing a field. Here is a simple trick to help you do just that. If the field you are missing cannot contain too many entries, then if you create a range table with all possible entries and add that range table to your WHERE clause, you can dramatically speed up a SELECT statement. Even when you take into account the extra time needed to retrieve the key fields, the results are worth it. This may seem a bit counter-intuitive, but the example code shows what I’m doing (but be careful – if you run this code in a QA environment, it may take a while):

I ran the above code in QA instances with a DB2 environment in both 4.6C and 4.7. There are more indexes on BKPF in 4.7, but I tried to use one that is in both versions. I also ran a similar program in Oracle with comparable results. But I really don’t know if it will work with other databases – please let me know!

I ran this many times in both active and quiet systems. Here are some typical results:


Time for first (fully qualified) select : 148 microseconds

Time for second (unindexed) select : 1,873,906 microseconds
Time for third select (indexed by selecting from the check table) : 455 microseconds

Time for fourth (partially indexed) select : 816,253 microseconds
Time for fifth select (indexed by hardcoding the domain values) : 43,259 microseconds
Time for sixth select (indexed by selecting the domain values) : 43,332 microseconds

Some things to note:

In the above times, the first select shows what happens in the ideal world. We are comparing select 2 against select 3 and select 4 against selects 5 and 6. But selects 2 and 3 should return the same results as should selects 4, 5 and 6.

But the point is that even though start out knowing nothing about (and presumably not caring about) the company code in selects 2 and 3 and the document status in selects 4, 5 and 6, if you put all possible values of these fields into the select statement, the results are dramatic.

If you try to combine both tricks, you will probably find that they don’t work very well together. Once seems to be enough.

To report this post you need to login first.

12 Comments

You must be Logged on to comment or reply to a post.

  1. sunil ojha
    hi Rob ,
    Its  really fine way which generally we ignore but it will work more better if you use SELECT DISTINCT  when you are filling  ranges table insted of select…..endselect.

    Let me know if i am correct.

    Regards,
    Sunil

    (0) 
    1. Rob Burbank Post author
      I’m actually not sure if you’re right or not. But I wasn’t really worrying about that portion. I’ll give it a try and let you know.

      Thanks very much for your comments.

      Rob

      (0) 
  2. Rashid Javed
    Hi
    Nice information. But i think buffering at database and application server level affect the runtimes of select statements. Like once some data is read from BKPF, it will reduce the subsequent select times.
    To overcome this buffering effect, i think you can change the order of ‘performs’ in your program and than average out the runtimes for each version of select.
    Just a thought….
    cheers!
    (0) 
    1. Community User
      Hi,

      Simply changing the order of performs should not affect runtime as long as database/app server buffer is large enough to hold results. One will have to make sure that the resultset is substantial enough to displace previously buffered entries. On the app server, you can easily bypass the buffer by explicitly issuing respective select statement, the database side of buffer is trickier to bypass unless you have administrative account and your own playbox to reset its buffers.

      Regards

      (0) 
    2. Rob Burbank Post author
      I agree with Shehryar that the order shouldn’t affect performance. But buffering was a big concern of mine when I was developing the program. That’s why for each select for which I measure a run time, I do a “preliminary” select to try to eliminate any of these effects.

      Another way (which I also tried but didn’t mention in the blog) is to run each select on subsequent nights. The run times were longer, in all cases, but the idea of getting the key fields from the database worked in all cases.

      Rob

      (0) 
      1. Rashid Javed
        Actually the main of my post was
        >>change the order<< of ‘performs’ in your program and than >>average out the runtimes<< for each version of select.

        Of course order will affect the runtime; if the data is already buffered, subsequent selects of simmilar data will return less runtime than actual.
        To overcome this I was suggesting to make multiple runs of the program with different ‘select’ order. That is if in a program we have two selects(lets call it select A and select B) on same tables, than run the program first time without taking any time measurement so that it can account for buffering. After that make two runs of the program with different select orders taking runtime mesurements for both and calculate the average runtime for each select afterwards.
        But again it was just a thought, as you have already mentioned two points in your reply
        1: “That’s why for each select for which I measure a run time, I do a “preliminary” select to try to eliminate any of these effects.”
        2: “run each select on subsequent nights”
        there can be many ways to address the same problem.
        Cheers!

        (0) 
        1. Rob Burbank Post author
          In my testing I created a job and executed this program a number of times. So this, in a way, altered the select order, but within a job, not the same program.

          But the point really is – make sure you use an index, even if you have to go out of your way to do it.

          Rob

          (0) 
  3. T4 Yang
    Hi Bob,

    Thank you for provide such useful information.

    I wonder weather it will be faster by adding a condition of BELNR (between ‘0’ and ‘z’).

    Another question is while getting BSEG from BKPF, whitch is better? database join (BSIS/BSAS/BSID/BSAD/BSIK/BSAK) or FOR ALL ENTRIES (BSEG)?
    It seems very slow by using FOR ALL ENTRIES when IT_BKPF(internal table) contains huge records.

    regards,
    T4

    (0) 
    1. Rob Burbank Post author
      Thanks for the input T4.

      I’m not exactly sure what you’re asking. But I think the answer is “It depends on what fields you have”. If you have the customer, vendor or GL account number, then start with the appropriate secondary index table; otherwise, see if you can force an index by using other key fields.

      My experience is that JOINs are faster than FOR ALL ENTRIES (I’ve written another blog on just that subject), but it’s not always true. Sometimes, FOR ALL ENTRIES is quicker. And of course, BSEG is a cluster table and cannot be used in JOINs anyway.

      But the purpose of this blog isn’t about JOINs or FOR ALL ENTRIES. It’s about how to go about getting a SELECT to use an index effectively.

      Rob

      (0) 
      1. T4 Yang
        I’ve found your blog about JOIN and FOR ALL ENTRIES.
        Furthermore, the blog of “Quickly Retrieving FI document Data from BSEG” is excellent.

        Thanks for sharing.

        Best regards,
        T4

        (0) 

Leave a Reply