Utility class for internal table manipulations (get min/max from any column in a table) rev. 3
Hi SCN community!
Ever wanted to get the line in a table where the max value for a certain field was? I had to last week, and I remembered just how awful it was to do it. Declare an auxiliary table of the same structure, copy the entire contents, sort for the right field, get the first line, correct?
I thought that there had to be a better way to do this, and I searched for it, and couldn’t find it. And I was actually challenged by Manish Kumar to create a class for this, and so I did.
What I’m sharing is a simple class for internal table manipulations. So far it has 5 methods:
- GET_MAX_LINE – Get the line with the maximum value for a certain column
- GET_MIN_LINE – Get the line with the minimum value for a certain column
- GET_MAX_VAL – Get the maximum value for a certain column
- GET_MIN_VAL – Get the minimum value for a certain column
- GET_AVG_VAL – Get the average value for a certain column
I’m trying to set up github so that we can all work on this together, hopefully we will be able to work on other utility classes like this one.
EDIT: Github successfully set up! You can follow this project here:
EsperancaB/sap_project_object · GitHub
Thanks!
For revision 3 I have changed the exception throwing strategy (much cleaner now) and taken Shai Sinai‘s suggestion of using the component’s name instead of looking for its index. GREAT catch!!
If you have any suggestions or contributions please feel free to join 🙂
Best,
Bruno
Hi Bruno,
It's great that you took the time to do this, but I checked your code and IMHO you are over-engineering it. Even worse, you are not reusing your component as you could use your get_max_val( ) in your get_max_line( ).
Those two things were quite surprising to me based on your very good previous blog posts.
Anyway, my message for you is: DRY and KISS.
Cheers,
Custodio
Hi Custódio,
First of all, as usual, let me start by thanking you for taking the time to look into this 🙂
I'm sorry I disappointed you 😛 This is why I like sharing stuff on the internet. People love to criticize. If something could have been done in a better way someone will let you know, guaranteed! People are also natural born procrastinators, but when people see someone else taking the initiative it usually motivates them to try and do it themselves, especially if they think they can do it better themselves. So I hope you will be one of the contributors when/if I manage to get github set up for this 🙂
Now, to your actual suggestions, I think you meant using get_max_line( ) in get_max_value( )? You are right, I did choose the "lazy" copy/paste path once I got the hardest part of the work done. I will take another look at it as soon as I have the time and follow the DRY principle, thanks.
If you want to share your own approach feel free to do it, I would love to take a look at it!
Abraço,
Bruno
It is good that you have shared a slinkee as well as normal code.
After opening the slinkee first, I did search/replace to make it readable, only to find that readable code is already present in next attachment.
I won't comment on coding style as i am still on team procedural.
Interesting. So why did you suggest a class rather than a FM?
Cheers
Custodio
I had commented about a cool way to do it.
max_value = zcl_itab=>get_max_value( itab, 'FNAME' ).
Above line does not have any keyword.
Subroutine or FM call would involve use of keywords.
Method call is closest thing to something like n = strlen( str ).
Also, Bruno's activity so far have shown that he is on other team.
Hey we can still be friends, even if we play on opposite teams 😀
In this case, I don't see any advantage on writing a method instead of a FM, but I don't see any disadvantage either.
Well... one could say that one advantage is that obsolete stuff is not allowed, like header lines (have I mentioned that I hate header lines?), but I would write an FM without these obsolete statements, so that wouldn't be a problem.
Cheers!
Bruno
We are part of same space, and the chosen side is just another attribute (editable).
I read and observe both sides but i am not ready to be a critic. What Custodio spotted, i didn't. That's the reason for not doing code review.
True, it doesn't have any keyword but as far as I'm concerned, this:
indicates a method call, not a subroutine, not a macro, not a FM.
Anyway, this is not important. The important thing is you made nice suggestion and Bruno accepted the "challenge".
Cheers,
Custodio
Hi Bruno,
Really good blog..... I frequently came across such kind of requirement and I find this
work of yours really helpful.
Keep it up!!!!
Regards,
Ashish
Thanks Ashish,
Appreciate it 🙂
Best regards,
Bruno
Hi Bruno,
Thanks for the post. Just wondering, why not using "SORT itab BY (otab)"?
https://help.sap.com/abapdocu_70/en/ABAPSORT_ITAB.htm#&ABAP_ADDITION_5@5@
Simple code can be achieved through complex SORT. No bubble sort; no RTTS.
" define the sort criteria.
oline-name = column_name.
oline-descending = true/false.
APPEND oline TO otab.
" Sort table and the 1st line is what expected.
SORT itab BY (otab).
READ TABLE itab INDEX 1.
Regards,
It is up to the person writing the implementation. There was a discussion recently wherein a person had internal table of size > 500mb, and he didn't want to loose the order or make a copy just to find the max value. This is one of the possible reasons for not using sort.
If i were writing the same logic, it would be because i miss the school days when similar exercise was given in C programming.
Exactly my thinking Manish! In terms of performance I don't think it could be any better than this... you pass the table by reference, one single pass to get the value, return it.
Regards,
Bruno
A long time ago (from my perspective 😛 ) a teacher I had in university talked in class about a friend of his, from his university time. He wrote very ugly code, with huge table declarations, for the values of cos and sin for example. However, his programs were ultra-fast compared to others, because instead of computing the cos/sin functions like the others, he would fetch the values from the tables he had created.
I think this is something we should keep in mind, especially when writing code in ABAP. Sometimes the solution that is more elegant to read, is not necessarily the best in performance. Actually it is usually the opposite. (This almost deserves a blog post of its own).
Cheers 🙂
Bruno
Rainbow tables are fast based on same technique.
Hi Guoqiang,
Interestingly enough, I thought of this! But I didn't know how to define the sort criteria dynamically, so I gave up. Thanks for that 🙂
One quick question though... in terms of performance what do you think will be better? I'm thinking that a single pass to find out the maximum value will be much quicker than sorting the entire table, no?
When I have some time I'll test it and I'll let you know 🙂
Thanks again!
Best,
Bruno
Hi Bruno,
LOOP works straight and linearly, but it accompanies some overhead cost on RTTS. I guess LOOP is faster but don't know whether SORT is optimized in some cases.
IMHO, since the Utility class is designed for general usage, in most cases, no need to worry about which is better/faster. When it is needed, there're usually some other requirements which help decision making.
P.S.
If the itab is mistakenly defined like "DATA itab TYPE TABLE OF bukrs.", statement "lo_descr ?= cl_abap_typedescr=>describe_by_data( <lf_line> )." would through an exception as the table line is NOT a structure, better catch it.
Waiting for your more blogs. ➕
Hi again Guoqiang,
Keep in mind that the "overhead" from RTTS only happens once, to know in which position the desired column is. After that, it's just a simple loop. A SORT statement might be faster for tables up to... I don't know, 10 entries? Bigger tables and the RTTS overhead will be, I believe, negligeable.
I totally agree when you say this utility class is designed for general usage, this is why I think it should perform well in every situation. So, let's imagine two situations, one with a small table, and one with a huge table. The RTTS approach will perform decently in both situations (my suspicion). The SORT approach is probably slightly better for small tables, but could be a problem for big tables.
Also, keep in mind what Manish said. With tables that are big enough, the SORT approach would not only be a problem, but it would actually be impossible. To use SORT you need to make a copy of the table. If you are low on memory and the table is big, you can't use it.
Regardless, I hope I will find the time today to make a quick comparison between both approaches and I will let you know 🙂
And many thanks for the uncaught exception! I will include it in the next revision 🙂
Best regards,
Bruno
Oops, I'm sorry Guoqiang, I just lied to you.
The RTTS should only be executed once, but unfortunately I'm redetermining the field's position for every line, which is stupid. I'm already correcting this.
Thanks again.
BR,
Bruno
I have news for you 🙂 I have tested both approaches, but now I'm going for lunch and I will tell you later the results 😀
Best,
Bruno
Hi again,
Ok so I ran two tests, one with a "small" table (SBOOK) with ~100k entries, and a second test with an "average" table (SBOOK again) with ~1.4M entries (the monster variant of the flight example), where I searched for the minimum value of the amount (FORCURAM) column.
The results:
VERY interesting! The SORT approach, which sorts the entire table is roughly 4 times faster than simply looping for the value!
Who would have guessed? I guess the only reason not to use the SORT approach would be in a memory problem situation. I would say a memory problem is a very uncommon issue, so I'm going to rewrite the methods using the SORT approach. If anyone can think of a way to have the best of both worlds I am all ears 🙂
Thanks!!
BR,
Bruno
Are you also considering the time taken to create temporary copy of internal table?
I am.
PS: Here's the excerpt of the code.
Ok, new revision online!
MAJOR changes in the logic. And the SORT approach had a couple of interesting challenges to it. It might be interesting to take a look.
I also followed Custodio de Oliveira "DRY" suggestion and the logic is centralized in one protected method. This is much better, obviously, I should have done this in the first place.
Who would say that a task so simple would spark such a discussion?
Thanks!
Best,
Bruno
I'm also seeing possible improvements already... I should have a method to return the position in a table's structure for a certain field. It would make the "core" method easier to read and it could also be a public method for the class...
I'll leave this one for later.
Cheers!
Hi Bruno,
Your test result is so "interesting" that I made one myself.
Data Volume:
2,000,000 BSEG entries with 5 columns (BUKRS+BELNR+GJAHR+BUZEI+DMBTR).
Target:
To find the largest DMBTR.
Comparison Methods:
1) SORT
(just sort the original itab without itab copy)
SORT lt_bseg BY (lt_column).
2) LOOP - static
LOOP AT lt_bseg ASSIGNING <fs_bseg>.
IF <fs_bseg>-dmbtr > lv_dmbtr_max.
......
ENDIF.
ENDLOOP.
3) LOOP - dynamic
LOOP AT lt_bseg ASSIGNING <fs_bseg>.
ASSIGN COMPONENT 5 OF STRUCTURE <fs_bseg> TO <fs_dmbtr>.
IF <fs_dmbtr> > lv_dmbtr_max.
......
ENDIF.
ENDLOOP.
Result:
1) SORT
3.5 seconds
2) LOOP - static
0.8 seconds
3) LOOP - dynamic
3.1 seconds
Will you consider returning to the older version? 😀
Regards,
LOL!
So I ended up agreeing with you, and you ended up agreeing with me 😀
Well, if nothing else, I think at least this shows how open minded people we are 🙂
Would you care to try with SBOOK table and let me know your results? Because honestly, I think a difference from 3.5 to 3.1 is not big enough for me to change the logic. And also, the SORT statement will allow you to find the median value, which is probably the next method I am going to implement 🙂
Thanks for your feedback!!
BR,
Bruno
Unfortunately, I don't have a handy IDES system so couldn't test with SBOOK table.
Hope you've noticed the code difference between "LOOP - static" and "LOOP - dynamic", and noticed the time difference. Seems "ASSIGN" statement is not so lightweight.
Anyway, hope the utility class run faster.
P.S. A new keyboard was bought. 😆
>>A new keyboard was bought
The one that makes sounds clickety-clack?
Hi,
Of course I noticed the difference, but is it possible to implement a static loop in this situation? I guess in these situations, where the data load is huge and performance is paramount, the only option is for the developer to code a specific loop. For a class accepting a dynamic table, I don't see a better option. Do you?
Thanks 🙂
BR,
Bruno
PS: LOL!
Agree.
By the way, if you defined your own internal table as a standard table, would you care to try declaring it as a sorted table with unique key BUKRS BELNR GJAHR BUZEI to see if it makes any difference?
Thanks 🙂
BR,
Bruno
PS: I noticed you joined SCN in June 2013 and only now you participated in something. Was this blog post so interesting that it made you break your code of silence? 😛
Consider a method with importing parameters I_ITAB of type ANY TABLE, I_FIELD type string, exporting parameters E_MAX_VALUE type any.
DATA: ref_to_workarea TYPE REF TO DATA.
CREATE DATA ref_to_workarea LIKE LINE of I_ITAB.
FIELD_SYMBOLS: <workarea> TYPE ANY, <field> TYPE ANY.
ASSIGN ref_to_line->* TO <workarea>.
ASSIGN COMPONENT field OF STRUCTURE <workarea> TO <field>.
LOOP AT i_itab INTO <workarea>.
CHECK <field> GT e_max_value OR e_max_value IS INITIAL..
e_max_value = <field>.
ENDLOOP.
Try timing that.
Hello Bruno,
using a reminiscence of a priority queue, I would propose a class that takes all data in the factory( ) and return an object that only have the min( ) or max( ).
regards,
JNN
Wow, and I thought Bruno was over-engineering! 😀
Hehe 😀 See? My approach was actually quite simple in comparison 😉
Hi Jacques!
Thanks for chiming in 🙂
You know I'm a fan of factories and polymorphism 🙂 But I can't see any added value in this situation. Could you tell me why you would go for such an approach?
Actually I'm not even sure if I understand it. So let's say I have a table and I want the max value of a certain column.
I instantiate class VALUE_QUEUE and I will call method MAX( ) of this object? But... What object is this? A value object? With a max and min? Seems strange to me. I'm probably misinterpreting your figure.
BR,
Bruno
Hello Bruno,
I wanted a simple interface. The client should create the object with a factory and retrieve the value needed. Having 4 method for this seemed much to me, so I started thinking.
Now I tried to implement and came up with this code.
The client now only need a single TOP( ) method with parameters.
Could you get the logic to return a whole table line as an untyped parameter work with some non-trivial structure? I could not yet. I guess something like class CL_NLS_STRUC_CONTAINER method CONT_TO_STRUC( ) is needed in a unicode system.
regards,
JNN
Hey Jacques!
I have mixed feelings about your contribution 😛
In one hand, you obviously have excellent coding skills and I am very interested in looking into your code and understanding it (which I will try to do after dinner).
But on the other hand, I can't help but think that what you're suggesting would be more cumbersome to use than a static class, like what SAP did, for example, with class CL_ABAP_CHAR_UTILITIES. Anyways, I haven't tested your code yet, maybe my opinion will change after seeing it at work.
Oh, and please consider becoming a contributor at github EsperancaB/utility_classes · GitHub.
I think it could be a nice idea to build a common repository for object based classes that everyone could reuse. Actually Matthew Billingham had already pitched that idea here but apparently it never took off. Maybe this is the push it needs 🙂
Thanks!!
Best,
Bruno
Hi Jacques!
At least now I understand your question. Yes, apparently SAP doesn't support untyped parameters in the returning section, only changing. You will not be able to use functional methods for this, or so it seems. I'm still debugging your approach to see if I can make sense of it 🙂
Thanks! Have you checked github yet?
Best,
Bruno
Oops, sorry, I didn't understand your question after all.
It would seem that when you type
ASSIGN lr_line->* TO <ls_line>.
This doesn't mean <ls_line> is a structure... Strange...
I might look some more into this when I have the time! But... I guess my question is, do you really need all those "type ref to data"?
Best,
Bruno
Hi again Jacques!
I think I understand the problem now.
You're trying to "stuff" a deep structure with an integer field into a char20 field (lv_act).
I still haven't been able to fix it... but I think I will. Just a sec!
Best,
Bruno
Hi Jacques,
Case was solved by Marco Jäger.
Apparently if you don't explicitly type a parameter, SAP automatically attributes a char type to it. So you need to check it the parameter is requested, like this:
Cheers!
Excellent one Bruno...... Thanks for Manish and Oliveria for their comments too...
Thanks Abilash,
Appreciate it 🙂
I've just realized I've been an idiot by catching and throwing the same exceptions over and over... (I'm reading on exception classes, and their 3 variants, CX_STATIC_CHECK, CX_DYNAMIC_CHECK and CX_NO_CHECK) here SAP Library - ABAP Programming (BC-ABA). So much to learn, so little time 🙂
Best,
Bruno
You can also make a chain of exceptions by filling the previous parameter of the exception class.
Did you notice the text tab of the exception class. Each exception class has a text having the same name as that of the class. You can add as many exceptions in the form of text.
Yes, I spent the whole afternoon learning all the details about the exception classes, but not for this utility class, for actual work 😛
I will revise this (again) when I have the time.
But hey, if you wanna do that for me go ahead and share the files and I can upload them 😀
Thanks.
BR,
Bruno
SCN document would be a better way to collaborate. It supports versions, and other can edit content if permission is given.
I understand your point... however my understanding of "document" is more appropriate when you are teaching someone to do something... this class is just something I wanted to share...
Anyways, hopefully I will be able to set up github soon, and it will be a better place to collaborate on this (after 4 months of being "internetless", hopefully the technician from Deutsche Telekom will make my life full again next week).
Cheers 🙂
Bruno
Hi,
What document do you want? For exception class - I only go through SAP help & some other documents widely available on net. If I find some good examples , will definitely share it with you people.
Regards,
DPM
Hi Debopriyo,
I think Manish was suggesting that it would be easier to collaborate if I wrote an SCN document instead of a blog post.
And what I suggested was, if you have the time, you could revise the utility class zcl_itab with proper exception class use and share it and I would update the blog post 😉
Cheers 🙂
Bruno
Hi Bruno,
Please keep in mind that I am still a newbie at OO and my knowledge in this topic is quite limited.
I reviewed the code and would suggest the following changes:
i) Change the name of the class from ZCL_ITAB to something meaningul like ZCL_AGGREGATE_FUNCTIONS. As per rule, name of a class should be a noun. Also, change the description of the class.
OO Programming with ABAP Objects: Classes and Objects
ii) Since all the logic is coded in method PROCESS_OPERATION. I think, it is better you handle the exceptions there itself rather than propagate them to the calling program. ( If not propogating exceptions, remove the signature from the interface ). This will eliminate catching exceptions raised by PROCESS_OPERATION in the other method.
iii) Currently, you are only raising the exception. Where are you planning to show the message text.
Sample Code ( PROCESS_OPERATION)
"Let's make sure there something to read
IF LINES( i_table ) = 0.
TRY.
RAISE EXCEPTION TYPE cx_list_error_empty_list.
CATCH cx_list_error_empty_list INTO lx_list_error_empty_list.
lv_text = lx_list_error_empty_list->get_text( ).
MESSAGE lv_text TYPE 'I'.
RETURN.
ENDTRY.
ENDIF.
Regards,
DPM
Hi Debopriyo,
Sorry for taking so long to reply.
Well... I think ZCL_ITAB is a meaningful name... class for internal tables 😛 Maybe it could be changed to ZCL_ITAB_UTIL. Anyways, that's something the developer can easily change.
I've been doing a lot of reading on exception classes. I think it would be a bad idea to catch the exceptions and have "MESSAGE" statements inside the class. How to deal with exceptions should be up to the caller, this class should simply throw them. Think about the CX_SY_ZERO_DIVIDE exception... would it make sense to that instead of having an exception a message would be shown? What if you didn't want a message to be shown?
I'm doing a few changes and will publish them soon. I've also set up a github for this, maybe we will be able to collaborate. It's just an experiment, maybe it will work, maybe it won't 🙂
Cheers!
Bruno
Hi Bruno,
Thanks for sharing this code.
One comment:
I have reviewed the code and it seems that method process_operation can be optimized:
You may ASSIGN COMPONENT directly by component name. Therefore, the retrieval of component position is little redundant.
Regards,
Shai
Hi Shai!
So you mean to say that about 20 lines of the code are useless?? Haha that's great, thanks so much for sharing that!
I'll be uploading a new version with that implemented soon.
Best regards!
Bruno
Hi,
In general finding max/min in unordered table is an O(n) algorithm. You need to check each element at least once.
Best sorting algorithms are O(n*logn) so sorting should always be worse solution for this problem.
Only reason for sort version to be faster in ABAP can be that SORT is done in native code in kernel and LOOP is done on ABAP side.
But still for larger tables I expect simple LOOP always wins. (see solution from Matthew Billingham above.
Hi Tomasz!
Thanks for showing interest in this 🙂
If you have some free time, please go ahead and try timing Matthew's suggestion. You should have all the information you need to perform a meaningful test from the comments above.
At the moment I am quite busy, so it might take a while until I look into it myself. Regardless I did a time comparison between the loop and the sort approach, and for "big" tables, the sort approach was 2x faster... so, until proven otherwise, this approach stands 🙂
Best,
Bruno