Why “INTO CORRESPONDING” is much better than its reputation
Purpose
In the SCN ABAP discussions I often read that using the “INTO CORRESPONDING” construct in ABAP SQL statements would degrade performance. With the example described in this document I would like to prove this “urban legend” wrong.
Please note that screenshots and results are based on an Oracle DB and might look different for other databases.
Example case
In this example, I want to read three columns from table DD03L for a specific TABNAME = “BUS_DI” (I am choosing these objects because they should be available in any ABAP stack environment, so you can hopefully run the attached little program).
EDIT: following subsequent comments by proven experts I would like to stress that reading DD03L in actual requirements is not recommended, and that the messages here apply to non-buffered tables.
The proper way is to declare a target structure with just the required columns.
TYPES: BEGIN OF ty_dd03l_required_columns,
tabname LIKE dd03l–tabname,
fieldname LIKE dd03l–fieldname,
reftable LIKE dd03l–reftable,
END OF ty_dd03l_required_columns.
DATA: t_dd03l_req TYPE STANDARD TABLE OF ty_dd03l_required_columns.
Often times though, the target structure is simply defined based on the table definition.
DATA: t_dd03l_full TYPE STANDARD TABLE OF dd03l.
CONSTANTS: c_tabname LIKE dd03l–tabname VALUE ‘BUS_DI’.
Now let us compare six different ways to code the ABAP SQL statement for the selection.
(1) The standard and of course correct way is to list the required columns.
SELECT tabname
fieldname
reftable
INTO TABLE t_dd03l_req
FROM dd03l
WHERE tabname = c_tabname.
(2) Another way to do it is using “*” in conjunction with INTO CORRESPONDING. This is where the “urban legend” has it that this would affect performance.
SELECT * INTO CORRESPONDING FIELDS OF TABLE t_dd03l_req
FROM dd03l
WHERE tabname = c_tabname.
(3) Just for comparison, let us also include this combination of field list and INTO CORRESPONDING.
SELECT tabname
fieldname
reftable
INTO CORRESPONDING FIELDS OF TABLE t_dd03l_req
FROM dd03l
WHERE tabname = c_tabname.
(4) Now switching to the full target structure, a combination of field list and INTO CORRESPONDING will likely not affect the run time, however more memory will be occupied this way.
SELECT tabname
fieldname
reftable
INTO CORRESPONDING FIELDS OF TABLE t_dd03l_full
FROM dd03l
WHERE tabname = c_tabname.
(5) The combination of “*”, INTO CORRESPONDING and the full target structure must be avoided. This looks very similar to number 2 and probably helped leading to the bad image of this construct.
SELECT * INTO CORRESPONDING FIELDS OF TABLE t_dd03l_full
FROM dd03l
WHERE tabname = c_tabname.
(6) Of course there might be cases where you really need (almost) all of the columns of a table.
SELECT * INTO TABLE t_dd03l_full
FROM dd03l
WHERE tabname = c_tabname.
The attached test program will run all these statements three times, activating the SQL Trace only for the last loop pass (in order to avoid buffering effects on the measurement) and calling the SQL Trace results for immediate investigation. You should create it in your sandbox as temporary object with name Z_CORRESPONDING (or any name, for that matter).
After running the program, you will arrive at the SQL Trace “Detailed Trace List”. This is also what you see when you start transaction ST05 manually. From the detailed list, please go straight to the “Combined Table Accesses” overview.
Let us focus on column „Access Time“.
You can see that our first four SQL statements took roundabout the same time to complete, whereas the last two took more than three times as much time. You might be surprised that statements 1 and 2 show no significant difference.
In order to understand why, please return from the summary to the detailed list. Now put the cursor on the row with the second REOPEN operation and click the “Explain” button.
This will show the SQL statement as it arrives on the database.
As you can see, despite the „*“ in our ABAP SQL statement there is now a list of the columns as they were declared for the target structure. Obviously the interface between ABAP and the underlying database (a.k.a. “DBI”) is smart enough to pass only the required columns, so that the statements 1 and 2 are identical.
Conclusion
The standard way to implement database selections from ABAP is to provide the field list together with “INTO TABLE”. However, do not hesitate to also use “* INTO CORRESPONDING FIELDS OF TABLE” when the context justifies it, just because it allegedly affects performance. You have seen that it doesn’t.
Quite to the contrary, INTO CORRESPONDING allows for some dynamic programming techniques. For example, think of an ALV list using CL_SALV_TABLE with the field catalogue defined as a DDIC structure. If you later need to add fields, in most cases it is sufficient to just add them to the DDIC structure, leaving the actual program code unchanged.
You can even use it with JOINs, however then you must be aware of columns with identical names but different values (e.g. EKKO-AEDAT and EKPO-AEDAT), possibly leading to unwanted results. As per my observation, the values from the table appearing later in the JOIN statement will “win” in this case.
In any case, I hope I have convinced you that INTO CORRESPONDING in select statements is not such a bad thing at all. Just make sure that your target structure contains the required columns only.
Please let me know in case you arrive at different results when you run the program in your sandbox.
Hopefully this will finally clear this everlasting myth 😎
Good job Thomas!
Good document Thomas Zloch . Finally this will put some sense into the ignorant QA reviewers 🙂
Cheers,
Suhas
Good job Thomas, the same is for move corresponding it is also thought of as a performance reducing statement but i checked its runtime and it was faster then moving single items to workarea.....
Excellent document Thomas Zloch. A recommended one for ABAP Development "Featured Content".
Regards, Vinod
Thanks guys, you are too kind. All of you know the real deal anyway, let's hope that we can convert a few non-believers. Would also like to hear in case there is different results on other databases.
Cheers
Thomas
thanks Thomas... your document will finally help remove some of the bottlenecks from our ABAP Development Guidelines because its a pain writing all the field names exclusively when sap can do it for us.
Great document Thomas, I really enjoy topics like this that show what's happening under the hood. I wonder if maybe it was true at some point in the past (before release 4 for example) and the rule-of-thumb simply persisted without ever being reexamined.
Cheers.
Thanks Amy, yes is this quite possible. Somebody here with access to an R/3 3.1 for double checking? 😉
Thomas
I also think so (without being able to check it), because I just noticed that in the SAP Press Book on Performance Tuning by Herrmann Gahm (German edition, 2009, page 201), it was also stated that the MOVE-CORRESPONDING should be avoided because it means effort for the application server.
Well done Thomas. I especially liked the part where you showed that the database interface is smart enough to 'switch' a "SELECT *" into a "SELECT fieldlist" if appropriate.
I have one comment and one side remark:
Your result is valid for tables that are not buffered. For accesses to buffered tables the overhead of the INTO CORRESPONDING FIELDS is not longer neglectable and the recommendation would rather be to omit this option at least in performance-critical code.
(remark: you chose table DD03L for demo reasons. In production code DD* tables should never be accessed directly; instead the nametab function should be used, for example DDIF_NAMETAB_GET).
Best Regards,
Randolf
Thanks for your additional input, I will do some tests with buffered tables as well. Yes, DD03L was chosen because it is available in SAP_BASIS, so all ABAP systems, and also has a lot of entries. I actually "borrowed" this idea from Siegfried Boes book, if I remember correctly.
Thomas
Hallo Thomas,
thanks for taking care of this issue.
There was a similar testcase provided with my book, which can be downloaded from the the publisher:
http://www.dpunkt.de/buecher/3096/performance-optimierung-von-abap%26reg%3B-programmen.html
See Examples
--- all in German only due to lack of English demand! ---
Buffered tables: Use SELECT * always! There is no need for fieldslists, all fields are anyway in the buffer, i.e. on the application server.
Nametab buffer: The DD03L is in the nametab buffer, i.e. never write SELECTs to DD03L in your program, use the interface DDIF_NAMETAB_GET. However, the DD03L is perfect for tests as it is available in all systems with lots of data. (But for Hana it should not be used, because the DD03L is in the rowstore.)
regards Siegfried
Thomas, i learnt some thing new today, appreciate your work. its excellent work.
Thank you for putting this document.
thanks & regards
satish chandra
Hi Thomas, very good job!
I agree with Randolf about the part of the database interface, I also liked it. But I would like to ask you a question. Some people say that the field list in a SELECT statement must match the order of the fields in the table, if not it will degrade perfomance. But I remember that I read once (but I don't remember exactly where, I guess it was about table conversions) that there is no guarantee that the order of the fields in the database is the same we see in the abap dictionary. Does it make any sense for you?
Regards,
Christian
Hi,
field order in DDIC may indeed not match the one on the DB. This is because an ADD-FIELD structure change does no longer result in a real conversion, but is done via ALTER TABLE ADD COLUMN on the DB layer. Now, esp. tables with append structures may look different on the DDIC. Think of a development sequence like this:
SAP delivers table as (K1,F1,F2)
Customer creates Append with two fields (K1,F1,F2,C1,C2)
SAP delivers additional field (K1,F1,F2,C1,C2,F3)
Customer changes Append to include third field (K1,F1,F2,C1,C2,F3,C3)
If you do a transport of this table to another system, that does not has this version history, the table will get created as (K1,F1,F2,F3,C1,C2,C3)
I think for this reason the address of a field is always calculated as a memory structure offset, I think you can see that, when you check the runtime object of a table in SE14.
Volker
Christian, to my knowledge this is another "urban legend" that can be disproven with a similar test program and SQL trace. Differences of a few percent or less mostly do not matter in practice.
Same thing for the sequence of fields in WHERE-clause and in the primary or secondary index, it does not need to be the same, as long as the required fields for efficient access are present.
Thomas
I also think so. It is clear that an intelligent compiler (and I am sure the ABAP compiler is, after many years of widespread usage and optimization) can evaluate the source and target fields during compile time, and create a mapping table from that. During runtime, moving the data around using that mapping table should take only very little time (except maybe for Large Object fields like strings - but that is a different topic, independent from sort order etc.)
Good one.
Thanks Thomas,
It is an excellant document.
Regards,
Debopriya Ghosh
Nice one Thomas. It's good to see this sort of stuff being posted.
The sad thing is, so many people will still persist the urban legend and even worse still, promote it and teach it to others.
I'd love to know where these myths originate from.
Gareth.
Thanks Gareth.
I assume that some of these things might have been true (to some extent at least) a decade or more ago when technology (cost based optimizers, interface between ABAP and DB, etc.) wasn't as advanced. Sometimes wrong test setups might also lead people in the wrong direction (e.g. when ignoring buffering effects, making subsequent selects so much faster than the first one).
However I will also never understand the motivation of copy/pasting such information without any critical examination. Luckily we have Matthew Billingham giving them a hard time lately, and I'll help when I can 😉
Thomas
Nice article Thomas, Thank u for sharing this.
I was actually one of them who deeply believed this myth to be true 😕 After I ran the performance analysis, I was surprised to see the result prove that wrong.
Thank u for clearing my doubts 😎
Its good document for us.
What if we check not only access time, but total performance?
What do you mean, the runtime by comparing timestamps on ABAP level?
You are welcome to enhance my code snippet accordingly and publish the results.
Thomas
Yes, I suppose that the difference in performance comes when abap finds corresponding fields during table filling.
I just quickly tried it and did not find a significant difference. To be honest, I would be quite surprised if this field mapping would take more than a few microseconds and if it wasn't almost independent of the actual number of returned rows (pointers, references, memory mapping, I don't know what...).
Thomas
I mean that the difference in performance would be like between move and move-corresponding.
Servus from Austria!
The most important point regarding runtime is that 1-4 request only three columns from the DBMS and 5-6 pull the complete record.
This difference in the SQL request can lead to a completely different access plan.
For example: If an index covering the selected columns exists, the DBMS does not need to read the "record" to provide the result. This can drastically reduce the LIOs needed.
And if you consider that a LIO costs about 50 to 100 microseconds whenf the database page is already in the cache, this is a non neglectable amount of time to save.
When it comes to PIOs which are wasted for reading the data that are not needed, the performance penalty is counted in MILLIseconds ...
Kind regards, Rudi
very helpful..cleared my doubts...thanks
useful information!
Thanks a lot 🙂
My results:
SELECT 1.849 4.600
SELECT 1.849 4.661
SELECT 1.849 4.494
SELECT 1.849 4.448
SELECT 1.849 21.744
SELECT 1.849 22.260
Useful Information!
Thanks a lot 🙂
Dear Thomas,
Its a Nice document.
I have Tested your program in our Sandbox but Switched the first two Select statements, ie., I placed first "into corresponding" statement and then "into table" statement and I found the results different. That is "into table" is performance optimized.
Attached is the code and Test results. Correct me if I am wrong.
Regards
Sajid
Sajid,
thanks for participating.
Please switch to "combined table accesses" view for easier evaluation, as described in the document.
In your result, you can see the runtimes of single chunks of 189 records that are returned. If you add them up, you get 3361 microseconds for the first and 3271 microseconds for the second statement. This is a variance in the range of 100 microseconds (millionth of a second!) that can occur anytime due to changing network load etc.
If you run the test multiple times, you will see similar variance, sometimes the first and sometimes the second statement might be slightly faster.
The conclusion for me has not changed, there is no significant difference between the two statements.
Thomas
Hallo,
As written above there is a framework program which you can use for runtime tests. Traces have an overhead and not ideal for runtime measurements.
Get the program here:
dpunkt.verlag :: Performance-Optimierung von ABAP®-Programmen - Wie Sie Ihre SAP®-Anwendung schneller machen
Everything is German, but usage is simple just numbers.
Start the report, it asks for a number: Use 511 is for INTO CORRESPONDING. run it
The results pdf shows all test which are predefined. There are lots.
The report contains also the source, always in testxxx, where xxx is visible in the Results pdf. The framework takes care for the averaging, parameters are also set.
Unfortunately I will have the time to invest more in that topic. English translation was planned, but never done.
best regards Siegfried
*-----------------------------------------------------------------------
* MI : SELECT range INTO CORRESPONDING FIELDS (244)
** MI1: ... alphabetic
** MI2: ... random order
** MI3: ... INTO TABLE (MA3)
** MI4: ... more fields, random
*-----------------------------------------------------------------------
FORM testmi1.
DATA:
lt_dd03l TYPE tab_244_dd03l.
xx = 'N'.
CLEAR tt.
*-------------------------------------
GET RUN TIME FIELD start.
SELECT adminfield as4local as4vers checktable conrout
datatype fieldname intlen inttype keyflag
mandatory notnull position precfield reffield
reftable rollname tabname
* LENG DECIMALS
* DOMNAME SHLPORIGIN TABLETYPE DEPTH COMPTYPE
* REFTYPE LANGUFLAG
INTO CORRESPONDING FIELDS OF TABLE lt_dd03l
FROM dd03l
UP TO n ROWS
WHERE tabname LIKE 'DD0%'
AND as4local = 'A'
AND as4vers = ' '.
GET RUN TIME FIELD stop.
*-------------------------------------
t = stop - start.
IF ( sy-subrc EQ 0 ).
DESCRIBE TABLE lt_dd03l LINES info1.
xx = 'Y'.
tt = t.
ENDIF.
ENDFORM. "testmi1
...
Hi Siegfried,
how are you doing?
I ran the report for 511, but I am confused about the red line, it does not seem to show average runtimes, but rather the fastest measured time:
Thomas
Hallo Thomas,
you are right is probably the minimum, I use very often the minimum to get with less statistics good results. Especially in the case of internal tables (not published report)
you need the minimum to get the logarithm of the binary search or the constant curve of the hash read.
if you really prefer the average then change in the FORM define_testcase the value of
min_avg = 'min'. to avg
Either globally or in the test case itself.
best regards Siegfried
Thanks for "punch" that "Urban Legend" Thomas 😉
From now on, lets do not hesitate to use:
Curios with SQL statement number six:
Then I looked into SQL statement as it arrives on the database, tadaaa:
the star (*) become all the fields on the table:
Unfortunately, this urban legend is based on what SAP says in help (the following text comes from 7.02) for SELECT ... INTO [CORRESPONDING FIELDS OF] wa:
If the CORRESPONDING FIELDS addition is specified, only those contents of columns for which there are identically-named components in wa are assigned to it.
I can hear '... while the others are not' at the end of that sentence, don't you? That would actually mean that all the requested columns of the result structure are populated with data and only those having the counterparts in wa are transferred there while the others are not (and this implicitly indicates the performance leak as there is no reason to populate these non-transferred fields at all then). Even though I am quite sure that the database interface was not that smart in older releases as it's today and therefore there was a reason to recommend using MOVE rather then MOVE-CORRESPONDING, this blog shows that this SAP help text is clearly obsolete and misleading.
Actually, there is one thing more for me (as it is probably for Sergey S as well) to consider when deciding which form to use (MOVE / MOVE-CORRESPONDING): the small overhead caused by column names comparison. Yes, we are talking the microseconds here but sometimes (e.g. in BI) the mass data transfer could take place for some reason (I would like not to get into discussion about field symbols vs. data references vs. copying between structures) or in another scenario there could be some nested loops etc. MOVE always beats MOVE-CORRESPONDING here. That's why I prefer the MOVE - at least in programs where there is no need for building structures dynamically.
Michal
There is no significant proof that INTO CORRESPONDING FIELDS OF TABLE has much lower performance than INTO TABLE,
And I don't think it's the point when doing performance tuning.
Actually, INTO CORRESPONDING FIELDS is more flexible, when you use iNTO table, the type structure and selected columns must be matched and this is not convenient in sometimes like the structure needs to be changed.
>Yes, we are talking the microseconds here but sometimes
You are right, microseconds .... in total, it is done only once at initialization and not once per row.
Another Glorified Myth busted !!!!
Thank you Thomas, very good job!,
My results on SAP ECC EHP6 (DB : MS SQL SERVER) :
8.182
Regards
hi thomas,
excellent document ,
really an eye opener for ignorant QA Reviewers who are too much obsessed with
performance issues rather than need of the hour...
i think same thing should apply for MOVE-CORRESPONDING,
MOVE to WorkArea instead of WA_ITAB1 = WA_ITAB2.
i think this statement is more optimal
in case of where the internal table field names are different from actual table field names
and if you writing the field names in jig jag manner not in the order they appear in Table...
Hi Thomas,
You wrote this in 2012. Interesting. I really like using into corresponding fields in case I need a table with an extra field(2) that I want to fill at a later date. I loved reading this because I was one of the brain-washed myth people that always said into corresponding fields shouldn't be used.
Ha! Now it's 2015 and I'm finally learning it. Of course the next step maybe SAP on the cloud where I can no longer touch the ABAP code. But for now this justifies what I'm doing.
Thank you, thank you, thank you... You've my programs shorter, and easier to maintain.
Michelle
Hi Michelle,
I'm glad it helped you, and good to see you here (albeit with new user ID, as it seems).
Cheers
Thomas
Thank you!!!! I think I have about 5 or 6 IDs. It's crazy. I gave up when I was using the work computer and it kept wanting me to get another ID and just said yes. 😆 No big deal, I can still do what I want.
As of ABAP 740 SP05, the evaluation of INTO CORRESPONDING has changed. Please refer to - ABAP Keyword Documentation
Blog amended to highlight its applicability.
Hi guys,
not sure what that change exactly means for the applicability of the blog. Before and after 740 SP05 the field list was determined by the declaration of the target structure.
Cheers
Thomas
Fair enough. I’ve taken my edit out. It seems better to me that this is simply discussed in the comments, as clearly I acted hastily. My apologies.
Hi Thomas Zloch,
As of ABAP 740 SP05 , 'into corresponding fields' is not preferred over giving the field list in select query. Even in the SAP ABAP for HANA Course book it is clearly mentioned that using field list in select query has got performance improvement over into corresponding fields'.
B.R.
I did not say that it should be preferred in terms of performance, rather that contrary to popular belief there is no performance disadvantage as long as the target structure is declared for the required field list only.
My own tests on a 740 SP12 HDB still show comparable results with both column store (SBOOK with “monster” data set) and row store (DD03L).
There might be other relevant aspects for newer releases or HDB based systems that I am not aware of yet, so please help me understand.
Thomas
It says in the documentation " If all required components are statically identifiable, the assignment of the fields in the addition CORRESPONDING after INTO is now determined when the program is generated and is not delayed until runtime." which backs up what you say.
I am not sure why you would not recommend "The combination of “*”, INTO CORRESPONDING and the full target structure must be avoided" assuming that you are requiring all the fields. I do not see any difference in the performance in your screenshots and I assume that the resulting binary code will be the same as in case 6.
If I see such a construct, I would assume incorrectly that it extracts only a few columns and I wouldn't pay attention to it if I would quickly read the code for performance issues. So it's better to remove CORRESPONDING FIELDS to avoid confusion if the intention is to extract all the columns.