Skip to Content

Last week a colleague from the WHM (data warehouse management) support team called me to get some assistance with a customer message he was processing.

The customer complained about short dumps with the infamous “TSV_TNEW_BLOCKS_NO_ROLL_MEMORY” error, which is basically the way a ABAP work process reports that it ran out of memory.

Checking the short dump

Inspecting the short dump a little further gave some hints on what happened:

[...]
|Error analysis
|    The internal table "\\PROGRAM=GP4MHJ3Z311GXR3HBZEIH3F768L\\DATA=IT_AZSDSOO0600"
|     could not be enlarged further.
|
|    Memory location: "Session memory"
|
|    You attempted to extend the data structure for the management of the
|    memory blocks for table "\\PROGRAM=GP4MHJ3Z311GXR3HBZEIH3F768L\\DATA=IT_AZSDSOO06
|    00". However, the 2696208 bytes required for
|    this were no longer available in the specified memory area.
|
|    The amount of memory requested is no longer available.
[...]

OK, apparently it had been tried to load a lot of data into an internal table (IT_AZSDSOO0600) but there hadn’t been enough free memory available for that. The error message mentions that approx. 2.5 MB had been requested, which is not really much, but not how much space was already allocated.

We also found information about what program caused this issue:

[...]
|Information on where terminated
|    Termination occurred in the ABAP program "GP4MHJ3Z311GXR3HBZEIH3F768L" - in
|     "EXPERT_ROUTINE".
|    The main program was "RSBATCH_EXECUTE_PROZESS ".
[...]

The BW-savvy folks already will have noticed that this is a generated ABAP program.
In SAP BW many activities that are specific to a certain InfoSource/Provider (e.g. loading data, transferring data…) are realized by generated ABAP programs.
These generated programs are usually based on a generic template and could often be extended with so called expert routines to implement customer specific requirements.
As we see in the error message above, this had been done here as well.
The short dump occurred right in the expert routine of the report.

This is also proven by the stack back trace resp. the list of active ABAP calls:

[...]
|Active Calls/Events
-------------------------------------------------------------------------------------------------
|No.   Ty.          Program                             Include                             Line
|      Name
-------------------------------------------------------------------------------------------------
|    7 METHOD       GP4MHJ3Z311GXR3HBZEIH3F768L         GP4MHJ3Z311GXR3HBZEIH3F768L           557
|      LCL_TRANSFORM=>EXPERT_ROUTINE
|    6 METHOD       GP4MHJ3Z311GXR3HBZEIH3F768L         GP4MHJ3Z311GXR3HBZEIH3F768L           504
|      LCL_TRANSFORM=>EXECUTE
[...]

The dump also nicely includes a snipped of the ABAP source where the error occurred (line 557):

[...]
|Source Code Extract
---------------------------------------------------------------------------
|Line |SourceCde
---------------------------------------------------------------------------
|  545|  METHOD expert_routine.
|  546|*=== Segments ===
|  547|
|  548|    FIELD-SYMBOLS:
|  549|      <SOURCE_FIELDS>    TYPE _ty_s_SC_1.
|  550|
|  551|    DATA:
|  552|      RESULT_FIELDS      TYPE _ty_s_TG_1.
|  553|    ... "insert your code here
|  554|
|  555|    if not SOURCE_PACKAGE is initial.
|  556|
|>>>>>|      select DOC_NUMBER DOC_CATEG REASON_REJ /BIC/ZCPP_NO  GROSS_WGT
|  558|        BASE_UOM NET_VALUE ITEM_CATEG UNIT_OF_WT /BIC/ZCASE_DAT
|  559|      from /BIC/AZSDSOO0600
|  560|              into corresponding fields of table it_AZSDSOO0600
|  561|        for all entries in SOURCE_PACKAGE
|  562|               where /BIC/ZCPP_NO = SOURCE_PACKAGE-/BIC/ZCPP_NO .
[...]

A hot lead

There we have it: A simple SELECT statement should have loaded data from the active
data table of an PSA/ODS into the internal table it_AZSDSOO0600.
If you check and compare this with the information a few paragraphs above, you’ll find that this is the internal table for which not enough memory was available.

So, we have a pretty hot lead here!

The next thing to check is of course to check how many rows do we expect to be returned from the source table /BIC/AZSDSOO0600?
The table itself isn’t that big: a quick look at the CBO statistics reveals that the table contains around 268.006 rows.

Could this already be too much?

By reproducing the issue we had been able to get some more details about the size of internal table before the dump occurs:

[...]
IT_AZSDSOO0600
Table IT_24841[51189146x104]
[...]

WOW!

For some reason we managed to select 51.189.146 rows from a table that only contains 268.006 rows.
How is that possible?
Besides that, there is also a WHERE condition with a restriction in place, so we would have expected less rows.
Checking the WHERE condition is in fact key to the solution of this problem.

The customer used the FOR ALL ENTRIES clause in this case which is a SQL feature specific to ABAP.
It allows you to fill a internal table, like SOURCE_PACKAGE in this case, with the values you want to select and hand this to the database without the need of making up a IN LIST yourself.
This is a nice treat to the ABAP developer and a good thing for the database, since the database interface layer plays a nice trick with this statement.

Instead of just creating a possibly super large IN LIST, the db interface automatically loops over the internal table and creates many SQL statements with small IN LISTS, like this:

SELECT
[...]
FROM
"/BIC/AZSDSOO0600"
WHERE
"/BIC/ZCPP_NO" IN ( :A0 , :A1 , :A2 , :A3 , :A4 )

During the execution of the ABAP SQL statement the database interface grabs chunks of 5 values from the internal table (SOURCE_PACKAGE) and executes the same statement over and over again, until all values had been used.
Since the same statement is reused very often, the parsing effort on database level is minimized and due to the short IN LIST the memory consumption for the parsing is also kept very small.

As the documentation for the FOR ALL ENTRIES clause explains, the database interface also takes care of making the lines of the total result unique.

So far so good – but still we don’t know why there was a multiplication of result rows in our case.

Maybe checking the SOURCE_PACKAGE list gives additional insight:


[...]
|SOURCE_PACKAGE
|    Table IT_1638[954x1260]
[...]

Obviously there had been quite a number of IN LIST items to be used here.
Divided by our 5 IN LIST items per statement, we can approximate that the SELECT needs to be executed 954/5 -> 190 times (+ 1 time with the remaining 4 items).
Let’s do a quick cross-check and divide our unexpected super large result set by this number:
51.189.146 / 190 => 269.416

Hmm… pretty close to our total number of rows in the table!

At this point it looks like the whole table was read into the internal table with every new chunk of five IN LIST items.
In fact this is exactly what happened here!

To be able to de-duplicate the result set before returning it to the ABAP program, the database interface first has to temporarily store all returned rows. Since there wasn’t enough memory to do this, the de-duplication wasn’t done yet, when the short dump occurred.

But why did we read the whole table over and over in the first place?
This was due to the contents of the SOURCE_PACKAGE table.
In there, the field used to create the IN LIST items (/BIC/ZCPP_NO) was far from unique.
For the vast majority of rows, the field was just the initial value (” “).
Therefore, the IN LISTs created effectively looked like this:

1st chunk (IN LIST values 1 - 5)
---------------------------------
"/BIC/ZCPP_NO" IN ( " "  ,  " "  ,  " "  ,  " "  ,  " "  )
2nd chunk (IN LIST values 6 - 10)
---------------------------------
"/BIC/ZCPP_NO" IN ( " "  ,  " "  ,  " "  ,  " "  ,  " "  )
3rd chunk (IN LIST values 11 -15)
---------------------------------
"/BIC/ZCPP_NO" IN ( " "  ,  " "  ,  " "  ,  " "  ,  " "  )

For each single chunk the repetition of a IN LIST entry would not change a bit of the result set. The duplicate conditions would just be ignored as soon as a row qualifies the first entry of the IN LIST (remember: a IN LIST is a special case of a OR and EQUAL condition).

Checking the source data in this case proved: most of the rows in the source table would be selected by this condition, so running one of the statements would produce a result worth of around 260.000 rows.

And there is our memory “eater”!

Due to the duplicate initial values in the field of the SOURCE_PACKAGE table, the database interface executed statements over and over again which effectively selected the whole table.
In addition to that it had to store all the single result sets into the internal table before removing the duplicates from it.

The solution to this problem is obvious and simple:
Before calling the SELECT … FOR ALL ENTRIES statement, the list of values to be fed into the FOR ALL ENTRIES statement should be made unique.

In this specific case, this would have made the list a lot shorter (down to ca. 20 entries).
Which would not only have prevented the short dump, but also reduced the number of SQL executions on the database from 191 to 5 plus it saved the time and memory to get rid of the unwanted duplicate rows.

As usual, once you know and understand the dependencies the effect is easy to understand – but at first sight I had been pretty surprised myself about a SELECT statement that multiplies the result set.

To report this post you need to login first.

53 Comments

You must be Logged on to comment or reply to a post.

  1. Raquel Pereira da Cunha
    Hi Lars,
    as an experienced ABAP developer – to avoid saying “old”… 🙂 – I really enjoyed the way you wrote this blog and how you clearly explained technically the effects of a “FOR ALL ENTRIES” statement. After many years I still find this kind of problems when doing support or QA in customers (in most cases the internal tables used are not even checked if they are initial, which is worse).

    Nice job!

    Regards,
    Raquel

    (0) 
    1. Lars Breddemann Post author
      Hi Raquel,

      thanks for your nice and positive feedback.
      I agree with you. Basic topics like the correct usage of OpenSQL often appears to be an underestimated capability for ABAP developers.
      In a recent poll I read, SQL knowledge was amongst the top three wanted topics from supporters and consultants.

      I’d be glad if this blog would help to improve this a bit.

      Best regards,
      Lars

      (0) 
  2. Pramod Teewaree
    Put the contents of the source_package in a temp, sort by the key that is going to be used in the where clause… and delete adjacent duplicates from the temp comparing that key. 😉
    (0) 
  3. Julius von dem Bussche
    Hi Lars,

    The blog reads like a mystery novel or watching a game of cluedo unfold.

    So it was the ABAPer, in a customer exit, with a report source template. 😉

    Anyway, still much better and safer than how the BO- DataServices “solved” the problem. They copied function group SUTL and removed the checks… 🙁

    Cheers,
    Julius

    (0) 
    1. Lars Breddemann Post author
      Hi Julius,

      yes, this case actually got me puzzled the first time I looked at the “evidence” 🙂

      Happy to have you among my readers – hope to be able to entertain you (and of course all of the other nice commentators) again with my next blog posts.

      Cheers and best regards from Vienna to Switzerland,
      Lars

      (0) 
  4. Michelle Crapo
    It’s so true.  For all entries can be a hard coding technique to use.  It can also be something that really helps your code.  (Less memory, less time…  Etc.)  See the other comments.  Without checking for initial – oh boy – an even worse eater!

    Loved the way this was written,

    Michelle

    Another “old” ABAP programmer

    (0) 
    1. Lars Breddemann Post author
      Thanks to you as well Sandra.

      Personally I find it quite difficult to improve on the writing part as usually feedback focusses on the technical core points and less on the way the topic is presented.

      Support cases like this fortunately offer a way to tell a “story”, which makes it a lot easier to come up with a blog post that is nice to read.

      best regards,
      Lars

      (0) 
  5. Clemens Li
    This is really great – I did not expect that the SAP wisdom “For all entries does not return duplicates” is true only if the for all entries table itself does not contain duplicates.

    Well I had a bad experience the other way round: I used FOR ALL ENTRIES with failure the other way round. I selected open item amounts for customers: A customer had two open items of 100,00 each. I got 100, not 200 because duplicates were deleted.

    That time I learned that with FOR ALL ENTRIES you should always select all key fields of the table.

    Use INTO CORRESPONDING FIELDS OF and you do not even need any target fields for the keys.

    This would have helped in this case also.

    So rules for FOR ALL ENTRIES:

    1. Make sure FOR ALL ENTRIES table is not empty. If it is empty, the complete WHERE clause including all other conditions is ignored!

    2. SORT/DELETE ADJACENT DUPLICATES from FOR ALL ENTRIES table.

    3. SELECT all key fields regardless if you need them or not.

    Follow these rules and get results as expected.

    BTW: Is there any rule or block size used for the various IN clauses?

    Comments welcome!

    And thanks a lot Lars for sharing this experience, Hope this encourages others!

    Clemens

    (0) 
    1. Lars Breddemann Post author
      Hi Clemens,

      the “SAP Wisdom” in fact is true – but in the case I described, the database interface did not even get to the point of removing the duplicates.

      Concerning your rules –
      1) I didn’t observe this yet. I’ve to try it out but I would be surprised if this would be correct.

      2) I agree 🙂

      3) NO! Totally not. Why would you select columns you’re not going to use? That would be one of the common major mistakes in database programming.

      I’d say: learn how the tools really work and adapt your expectations 😉

      If you’re referring to the blocking factors the database interface applies, then you most often will be OK with the defaults (5 values per IN LIST).
      Everything else would require testing and benchmarking – but could be worth the effort.

      regards,
      Lars

      (0) 
      1. Clemens Li
        Hi Lars,

        1) just try – or believe me
        3) by selecting all key fields also without target field you will get all records – see my example where I did not get two items with 100,- but only one because they  were aggregated as duplicates. This does not happen if you select all key fields of the DB table. Obviously the aggregation takes place before the date are transferred to target area. It applies only to FOR ALL ENTRIES!!! Usually you are right – here I may be right. Try!
        Clemens

        (0) 
        1. Lars Breddemann Post author
          Hello Clemens,

          1) I tried it out and you’re right – in case the internal table is empty, then the other WHERE conditions aren’t put into the SQL. Puh – I’ve to think about that and check whether this is designed behavior or a bug.

          3) Well, this is definitively no bug, but just a problem of how the word “DISTINCT” is understood 🙂
          Seriously, when you’re about to do a summation – why don’t you just do it via SUM( ) and GROUP BY? This would not only make the overall processing faster but also wouldn’t lead to unexpected results.
          As a rule of thumb: when data processing on DB level is possible, have it done there. Don’t send data back and forth for simple operations like SUM, AVG etc.

          As I wrote: the de-duplication of course happens before the user code gets back the result table.

          best regards,
          Lars

          (0) 
          1. Michelle Crapo
            I would use this where I have a master data table already in my programs.  Limit the data in the master data table by some logic.   Something where you didn’t want to do it in the select.  Then I would use for all entries to only pull the limited about of records.

            “Distinct” – would still return – or even sum up the wrong group of records.  (If I only needed some of them summed and not all.)

            Does that make sense?

            Michelle

            (0) 
            1. Lars Breddemann Post author
              Hi Michelle,

              hmm… sure you _can_ do such stuff and I can imagine requirements that would request a kind of conditional aggregation.

              However, I wouldn’t consider these as ‘usual’ and also not trivial. In such cases, one has to be very careful to fully understand the desired logic of the program and the functions/methods used upon her data structures.

              Of course, all this is just my personal opinion, but from what I’ve seen yet, most often the problematic part of such reports was that the desired application logic wasn’t quite clear.
              And such a situation invariably will always lead to bugs and mistakes.

              best regards,
              Lars

              (0) 
          2. Clemens Li
            as far as I know SUM, AVG etc. do not work with for all entries, wrong?

            Except from that SUM, AVG etc. are always slower wehen done by the database, ABAP is faster.

            My Point was to select all key fields to avoid unwanted deletion of duplicates, and – not to have target fields for them – if you do not need them, just as you recommend.

            Regards Clemens

            (0) 
            1. Lars Breddemann Post author
              Aggregate-Function (except count(*)) don’t work for FAE – true.

              But your claim that these db-functions are usually slower than doing the work manually in ABAP definitively needs to be proven.
              Why should moving the records from DB to AS, looping over them and performing the aggregation to finally come up with the result be faster than asking the database to do that?

              Have you tried this?
              Why do you think, we use db-aggregation whenever possible in BW? To sell more hardware?

              regards,
              Lars

              (0) 
              1. Carsten Kasper
                > Why should moving the records from DB to AS, looping over them and performing the aggregation to finally come up with the result be faster than asking the database to do that?

                Just as a small note: We got the ABAP buffer. A lot of database entries are buffered in it. Access to this buffer is a lot faster (if I remember correctly about 10 times) than to the database. Aggregate functions do not work on this buffer.

                As for these cases when the data is there in the ABAP buffer, selecting the entries and doing the calculations in ABAP is faster.

                Reference: “ABAP performance tuning” – Hermann Gahm

                Really enjoy the discussion following this post! There is good learning in here 🙂

                (0) 
                1. Lars Breddemann Post author
                  Of course, when the data is in fact already buffered in the application server memory, then you’re better of.
                  But usually aggregates are taken from lots of data. Mass data which you rather not want to keep in the application server memory all the time.

                  So, there is of course the option to be quicker than the database call.

                  As usual one have to know the data and how the data is used to come up with the optimal solution.

                  regards,
                  Lars

                  (0) 
      2. Suhas Saha
        Hello Lars,

        A very intersting read 🙂

        Regarding Clemens’ points 1 & 3 they are mentioned in the SAP documentation for FAE.

        Actually point 2 was part of the QA checklist in my previous project. For the very specific reason of selecting the data from the table for the same fields recursively! Never thought that it might lead to memory issues.

        Thanks & Regards,
        Suhas

        (0) 
        1. Lars Breddemann Post author
          Hi Suhas,

          glad you liked the read.
          Hmm… would you mind to post the link to the documentation you’re referring to?
          I did not find any documentation that would cover point 1) or would support point 3).

          thanks and regards,
          Lars

          (0) 
            1. Lars Breddemann Post author
              Hi Suhas,

              thanks a lot! Looks like I’ve been checking a too old version of the documentation or just overread the explanation of point 1.

              Concerning point 3: yes, the documentation mentions the de-duplication, just as I explained in the blog.
              But Clemens’ point was to recommend fetching columns that aren’t required for the processing.
              That is something I disagree with.

              Again, thanks a lot for digging up the docu.

              Cheers,
              Lars

              (0) 
              1. Suhas Saha
                Hello Lars,

                But it seems you have overlooked the deletion of duplicate rows from the “result set” i.e., the resulting internal table.

                There is also a mention of the same in the 3rd bulleted point of the Notes section.

                Infact point#2 is not explicitly mentioned in the documentation, but is rather implied!

                Excuse me if i sound a lil’ blunt!

                Cheers,
                Suhas

                (0) 
                1. Lars Breddemann Post author
                  🙂

                  I don’t think so.
                  Read again: “To be able to de-duplicate the result set before returning it to the ABAP program, the database interface first has to temporarily store all returned rows. Since there wasn’t enough memory to do this, the de-duplication wasn’t done yet, when the short dump occurred”

                  The feature of de-duplication or better the not yet happened de-duplication of the result set is the very core point of the whole post…

                  BTW: I really like how many people discuss this topic so closely and focussed on all the little details.

                  Cheers,
                  Lars

                  (0) 
                  1. Suhas Saha
                    Hey Lars,
                    I get your point & it is in sync with what is mentioned in the SAP documentation!

                    But as Clemens has mentioned we should select “key” fields of the DB table as a precaution to prevent the duplication of records.

                    BR,
                    Suhas

                    (0) 
                    1. Lars Breddemann Post author
                      And this is precisely the point I don’t agree upon.
                      Don’t just select fields out of ‘precaution’.
                      Understand the requirement and understand how the commands actually work and then select only the data you need.

                      There are TONs of cases where just SELECT * FROM was used to ‘be on the safe’ side.
                      But such coding disables database optimisation, increases network time, requires more memory and CPU on the application server…

                      As I wrote also to Michelle: sure there can be requirements of conditional/non-linear aggregation. But they are rather seldom and need to be handled with much care anyhow.
                      In the vast amount of all cases, a correctly designed SQL statement will deliver the aggregates just fine.

                      cheers,
                      Lars

                      (0) 
                      1. Matthew Billingham
                        Such coding also is NOT SAFE! A principle of safe programming is to specify exactly what you want. Let’s say there are two fields in a database one named hfwzt the other hfzwt. You’re only interested in hfwzt . You’ve used SELECT *, and you’ve got some data in a work area. You do some data calculations –

                        val1 = wa-hfwzt / e.
                        val2 = wa-hfwzt + e ^ ( pi * i * wa-hfzwt ).

                        oops. You’ve used the wrong field. You won’t notice it until you run the program and get the wrong results. Trying to find a typo consisting of two characters is not going to be easy.

                        If you’d defined your work area to have exactly the fields you want, and selected exactly those fields, then you wouldn’t have had the problem. This may seem extreme, and perhaps it is, but the principle remains, and strange short field names are all over SAP.

                        matt

                        (0) 
  6. Stefan Koehler
    Hey Lars,
    in my opinion and experience the ABAP programmers seems to have less SQL / database knowledge from year to year. That’s pretty scary – it’s the same like a PL/SQL programmer who have no idea about SQL, but for an ABAP programmer it seems to be the “standard” right now.

    In my experience the FOR ALL ENTRIES clause works pretty fine in general – the only thing that could be optimized by default, is the max number of IN-LIST parameters (rsdb/max_blocking_factor). The SAP default value (5 as far as i can remember) is too less in my experience, but that could differ from environment to environment of course.

    Best Regards
    Stefan

    (0) 
    1. Lars Breddemann Post author
      Hi Stefan,

      as you already know me – I’d answer this with yes/no 🙂

      I agree with you on the impression that fundamental technology and techniques related to it seem to get pushed into the background, while other aspects like new fancy tools’n’languages gain a lot of attention.
      Maybe it’s because this ‘old’ stuff usually isn’t very sexy or sometimes even plain boring.
      On the other hand being able to sell yourself as an ABAP developer usually makes up a nice pay check.
      So, yes, there are a lot of people building code with tools they don’t understand properly.

      But then again, you find well designed coding correctly implemented and support requests that are impressing well pre-clarified.
      There definitively IS a market for those ‘deep-dive’ people. And luckily many of them seem to be active here on SDN (just e.g. like you, Michelle, Julius, Jim Spath and the others I always forget to mention).

      About the blocking factor optimization – yep, can be worth it. But to me, this really is one of the knobs I would think of at a rather late point of performance analysis.
      It’s a system wide setting and I really don’t like to fiddle with those as their total effect is most often not easy to grasp.

      Happy to have you kept amongst my regular readers, although my posting rate it rather irregular…

      Cheers,
      Lars

      (0) 
  7. Kumud Singh
    Hi,
    This situation might not be met by all of us in common circumstances or until similar data in database is encontered and likewise.Proactively now we can prevent this.

    Certainly is a good learnig for ABAP developers.

    Thanks,
    Kumud

    (0) 
    1. Lars Breddemann Post author
      Kumud – you hit a very important nail right on the head.

      Database programming always depends on the data you are working on. Very often we have assumptions about the data in mind, maybe even not explicitely, but they are there nevertheless.
      And based on these assumptions, programs are written and application logic codified.

      In fact, my first posts (Starting off with a classic…) and (A classic in disguise…) many years ago were just about such assumptions.

      The important bit here is to make sure that your appliation code will run correctly regardless of the input data.

      best regards,
      Lars

      (0) 
  8. Suhas Saha
    Hello Lars,

    As i understand we all love this beautiful language ABAP, i thought the best way to convince you on the debatable point#3 is by citing an example 🙂

    DATA: gt_airlines    TYPE SORTED TABLE OF scarr WITH UNIQUE KEY carrid,
          gs_airline     TYPE scarr,
          gt_all_key     TYPE SORTED TABLE OF sflight WITH NON-UNIQUE KEY carrid,
          gt_not_all_key TYPE SORTED TABLE OF sflight WITH NON-UNIQUE KEY carrid.

    gs_airline-carrid = ‘AA’. INSERT gs_airline INTO TABLE gt_airlines.

    * Case#1: Select all the key fields
    SELECT carrid connid fldate price currency “Key fields-CARRID,CONNID,FLDATE
    FROM sflight
    INTO CORRESPONDING FIELDS OF TABLE gt_all_key
    FOR ALL ENTRIES IN gt_airlines
    WHERE carrid = gt_airlines-carrid.
    IF sy-subrc = 0.
      WRITE: / ‘Case#1:’, 20 sy-dbcnt.
    ENDIF.

    * Case#2: Select only one key field
    SELECT carrid price currency
    FROM sflight
    INTO CORRESPONDING FIELDS OF TABLE gt_not_all_key
    FOR ALL ENTRIES IN gt_airlines
    WHERE carrid = gt_airlines-carrid.
    IF sy-subrc = 0.
      WRITE: / ‘Case#2:’, 20 sy-dbcnt.
    ENDIF.

    CLEAR gt_not_all_key.

    * Case#3: Select only two key fields
    SELECT carrid connid price currency
    FROM sflight
    INTO CORRESPONDING FIELDS OF TABLE gt_not_all_key
    FOR ALL ENTRIES IN gt_airlines
    WHERE carrid = gt_airlines-carrid.
    IF sy-subrc = 0.
      WRITE: / ‘Case#3:’, 20 sy-dbcnt.
    ENDIF.

    TOP-OF-PAGE.
      WRITE: ‘Database Entries Selected’.

    You can clearly see the inconsistency if you decide not to choose the key fields. The amount of data returned is different in all the 3 cases. This is why Clemens & I have been suggesting to select all “key” fields when using FAE!

    Cheers,
    Suhas

    (0) 
    1. Lars Breddemann Post author
      Hi Suhas,

      thanks for the example. Though I did of course understand what you and Clemens meant before.

      But my point still is a different one.
      What is the actual use case of just selecting let’s say the price/currency information and getting duplicates for that?
      See, Clemens’ point was to select the KEY FIELDS although you don’t want to do anything with them.
      Like you would, if you would do a home-grown aggregation.

      I completely agree with you: if you need to have the fields in there (because you would make any sense from the price information without the corresponding key fields), then you must select them.
      If you just want to know, say, the average price,
      then you would do SELECT avg(price) instead 🙂

      Remember: SQL is about manipulating SETs of data – not LISTs!
      And in proper SETs there are no duplicates – so FAE works totally correct 😉

      (0) 
      1. Suhas Saha
        As a matter of fact i get your point now! Darn!!

        Didn’t i mention we all “love” ABAP? See a small code snippet always helps.

        You just made my day, Lars!

        Cheers,
        Suhas

        (0) 
        1. Lars Breddemann Post author
          Hey Suhas,

          I knew we would come to common ground on this 🙂

          Glad that this worked out so well.

          Appreciate your comments and the discussion.
          Cheers,
          Lars

          (0) 
      2. Suhas Saha
        As a matter of fact i get your point now! Darn!!

        Didn’t i mention we all “love” ABAP? See a small code snippet always helps.

        You just made my day, Lars!

        Cheers,
        Suhas

        (0) 
    2. Kumud Singh
      Hey Suhas,
      I am now wondering that many of us have been using this FAE statement with so much pleasure and ease and not knowing the points highlighted in SAP doc. except point 1. (Clemens Li)

      AWESOME LEARNING…thanks for the code snippet.
      One question: In general select statement as well, is it better to use all primary key?????

      Regards,
      Kumud

      (0) 
  9. Matthew Billingham
    In general, FAE isn’t a good statement. I’ve often seen a “select into a table”, then a “select FAE” in that table. An inner join is a better solution. FAE of course was around for a long time before INNER JOIN was available, so it’s still quite prevelant.

    For BW, (where I spend most of my programming time), you’re supplied with the Package data, so you can’t use an INNER JOIN. Even so, I very rarely use it – because it is so easy to use it incorrectly and end up with an initially baffling problem.

    A couple of off topic points:

    I smile that the DSO starts with a Z. In BW, the names of customer objects are, in their technical implementation, prefixed with /BIC/.

    Small amounts of code in an expert/start/end routine are fine. However, for anything remotely complex, I encapsulate the coding in an external class – the routine just calls the appropriate method. In this way, I get the benefits of version management, and I can change the ABAP without having to regenerate the transformation.

    matt

    (0) 
    1. Lars Breddemann Post author
      Hi Matt,

      thanks for sharing your experiences.
      You’re right with the remark that an INNER JOIN can be used in many cases where traditionally a FAE had been employed. In fact, this is something we do ourself in BW queries when materializing SIDs into temporary tables (you know, those /BI0/02… tables).

      However, I wouldn’t go as far to say it isn’t a ‘good’ statement.  It’s neither good or bad – it’s just one of many ABAP command features that need to be understood to be used correctly.

      And yes, the naming of the DSO was like that in the customer system 🙂

      About the performance problems – there’s a difference between running into problems and not providing optimal performance.
      Very often, this difference is not much for one single statement, but can add up to much for a whole system.
      Just selecting all columns of a table instead of just the required ones is a nice example for that.
      Usually you wouldn’t ‘feel’ a big difference during development of code. And using the * is so much more comfortable than writing down all the columns manually.
      But given productive workload this can lead to way less performance than possible due to the overhead produced on db, network and application server resource side.

      When it comes to performance it is always about measuring the actual effects. Without that, all we can do is to guess.

      best regards,
      Lars

      (0) 
  10. Siegfried Boes
    Hallo Lars,

    interesting discussion.

    Unfortunately I find the blog not 100% clear, and I think there is a lot of misunderstanding in
    the discussion.

    Lets start with your numbers (51.189.146 / 190 => 269.416):
    + How many records of the db table /BIC/AZSDSOO0600 have the value Initail (” “) for /BIC/ZCPP_NO?
    + How many records are in the driver table SOURCE_PACKAGE?
    + How many of these are initial values for /BIC/ZCPP_NO?

    Actually 51.189.146 / 191 is exactly 268.006!
    So every block also the incomplete one return the whole DB table!

    >For the vast majority of rows, the field was just the initial value (” “).
    I do not really understand, where are the other values, your calculations implies that there are only the initial ones.

    Why is the initial value at all included in the selection, if it is fulfilled by nearly all records?
    This is very special here and a questionable condition from logic.

    kind regards     Siegfried

    (0) 
    1. Lars Breddemann Post author
      Hello Siegfried,

      the numbers should be mentioned in the blog where necessary… let’s see:
      – total number of rows in the table: 268.006 rows
      -> how many of them are initial? I haven’t noted that down, but _nearly_ all of the rows where initial, as I mentioned.

      – how many records in the driver table?
      -> it’s there: Table IT_1638[954×1260]

      – how many rows of that table INITAL?
      -> again, I don’t have the exact number, but again, the majority of the rows where initial.

      And yes, you got it right: most of the rows in fact were initial for the respective column.

      Concerning the flaw in logic: I really cannot jugde the business logic of the customer here.
      It might well be perfectly OK to have the colunm initial for most cases.
      The point of my blog however was to show, that relying on the deduplication of the FAE statement can come with high costs.

      best regards,
      Lars

      (0) 
      1. Siegfried Boes
        Still not clear:
        Actually 51.189.146 / 191 is exactly 268.006!

        So if these numbers are correct and not deduced afterwards, then all records on the DB are initial and all in the selection are identical without a single exception!

        >It might well be perfectly OK to have the colunm initial for most cases.
        Of course this is o.k. on the db-table, but the selection is nonsense!

        (0) 
        1. Lars Breddemann Post author
          ok, I see where you aim at.
          The point here is: the short dump occured BEFORE all data was read, because no more memory was available.
          So for the rows that had been fetched up to now, only the INITIAL values had been read.
          Also, the 268006 are the value of the CBO statistics – which in this case wasn’t the CURRENT value.

          As for the selection criteria: yes, from a common-sense perspective it’s nonsense to put in thousand of times the same selection criteria.
          But from a formal point of view this is as correct as using the criteria only once, since we have an OR concatenation of the selection criteria here.

          (0) 
  11. Siegfried Boes

    The FAE would have no problem if you change the code into:<br/><br/>

    <br/>if not SOURCE_PACKAGE is initial.<br/><br/>  loop at SOURCE_PACKAGE ASSIGNING <fs>.<br/>    if ( <FS>-/BIC/ZCPP_NO = ' ' ).<br/>      case = 1.<br/>    else.<br/>      collect  <FS>-/BIC/ZCPP_NO into SOURCE_TEMP.<br/>    endif.<br/>  endloop<br/><br/>  if not SOURCE_PACKAGE is initial.     <br/>    select DOC_NUMBER DOC_CATEG    REASON_REJ /BIC/ZCPP_NO  GROSS_WGT<br/>           BASE_UOM NET_VALUE ITEM_CATEG UNIT_OF_WT /BIC/ZCASE_DAT<br/>           from /BIC/AZSDSOO0600<br/>           into corresponding fields of table it_AZSDSOO0600<br/>           for all entries in SOURCE_TEMP   <br/>           where /BIC/ZCPP_NO = SOURCE_TEMP-/BIC/ZCPP_NO .<br/>  endif. <br/><br/>  if ( case = 1 ).<br/>    select DOC_NUMBER DOC_CATEG REASON_REJ /BIC/ZCPP_NO  GROSS_WGT<br/>           BASE_UOM NET_VALUE ITEM_CATEG UNIT_OF_WT /BIC/ZCASE_DAT<br/>           from /BIC/AZSDSOO0600<br/>           into corresponding fields of table it_AZSDSOO0600_temp<br/>           where /BIC/ZCPP_NO = ' '.<br/>    append lines of it_AZSDSOO0600_temp to it_AZSDSOO0600.<br/>   endif.<br/>endif.<br/>

    <br/><br/>As said I have doubts that the case=1 part is actually necessary. Maybe these are not needed, then a WHERE-clause ( <> ‘ ‘ ) could be added to the loop.<br/><br/>The COLLECT into hashed table with variables of FAE as unique key is even faster than a SORT and DELETE ADJACENT DUPLICATES.<br/><br/>Siegfried<br/>

    (0) 
    1. Lars Breddemann Post author
      Hi again Siegfried,
      thanks for the coding snippet.

      This is surely a good example for the customer who wrote the coding discussed in my blog how the FAE can be used in a safer and more performant way.

      Just a side-comment on your solution:
      By doubling the code you increased the amount of necessary maintenance if the customer decides to change the underlying DSO.

      Also, in case that the initial value is actually required, two SQL statements will run slower in total than just one SQL statement. In the worst case, you end up with two full table scans…

      So, maybe it would still be more efficient to sort and remove on the rather small SOURCE_PACKAGE internal table then to access the large DSO database table twice.

      regards,
      Lars

      (0) 
      1. Siegfried Boes
        the FAE is anyway accessing the db several times, so my split is no actual overhead.

        And it is actually only thought for discussion, is the lower necessary or not. If yes, then you can add it to the other branch.

        However, you should be aware that your reported problems come from this case=1 branch. And a Selection with that logic is very very rare. I have seen such a problem … and I have seen a lot of performance problems.

        Siegfried

        (0) 
  12. HS Kok

    Hi Lars,

    What you wrote in this blog was an interested read, though it was least surprising to me. I have seen many instances of BW developers writing this sort of code in their routines and lookups, with severe impact to the performance of data loading/extraction.

    What I usually do is:
    – Declare a temporary internal table and assign the source package to it.
    – Sort, and remove duplicates (according to the field that I need to lookup with) from the temp. internal table, and use that in the FAE.

    I don’t operate directly on the source_package itself, as changing the order of records returned in the source package can sometimes have dire consequences… 🙂

    Immediately after the select statement is complete, if the unique list of field values is no longer required, FREE the internal table. Otherwise, I will keep it for the next part of my routine/lookup.

    (0) 
    1. Lars Breddemann Post author

      Hi HS Kok,

      Well it was suprising to me as well to see this issue and the effect of it – that’s why I wrote the blog post 😉 .

      regards,

      Lars

      (0) 

Leave a Reply