Application Development Blog Posts
Learn and share on deeper, cross technology development topics such as integration and connectivity, automation, cloud extensibility, developing at scale, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 
JerryWang
Advisor
Advisor
0 Kudos

The requirement is: there is an internal table with a large number of table row.


If all rows have the identical recipient_id, that id( 30273 ) must be returned.































UUID Phone_number Recipient_id
0412ASFDSFDSFXCVS 138XXXXX1 30273
0412ASFDSFDSFXCVD 138XXXXX2 30273
0412ASFDSFDSFXCVF 138XXXXX3 30273
30273

If not, it must return empty.




























UUID Phone_number Recipient_id
0412ASFDSFDSFXCVS 138XXXXX1 30273
0412ASFDSFDSFXCVD 138XXXXX2 30273
0412ASFDSFDSFXCVF 138XXXXX3 30273
30272

The table line type structure in the project looks like below:


Three different solutions have been made.



Approach1


the idea is a temporary table lt_sms_status is used to hold all the content of the internal table to be checked, and then SORT on the temporary table and delete adjacent entries. If all the table row have the same recipient id, after the operation there must be only one entry left.



DATA: lt_sms_status LIKE it_tab.
lt_sms_status = it_tab.
SORT lt_sms_status BY recipient_id.
DELETE ADJACENT DUPLICATES FROM lt_sms_status COMPARING recipient_id.
IF lines( lt_sms_status ) = 1.
READ TABLE it_tab ASSIGNING FIELD-SYMBOL(<line>) INDEX 1.
ev_rec_id = <line>-recipient_id.
ENDIF.

The drawback of approach1 is it could lead to the unnecessary high memory assumption. when lt_sms_status = it_tab is executed, no new memory allocation will not occur, until the write operation on the copied content. This behavior is documented as "Delayed Copy".


We also have concern regarding the performance of SORT and DELETE keyword when they are executed on a big internal table.




Approach2


Now we fetch the recipient id of the first row, and compare it with the left rows in the table. If most of the table rows have different recipient id, the execution has the chance to quit early. However if unfortunately all the table rows have exactly the same recipient id, this approach has to loop until last table row.



DATA: lv_diff_found TYPE abap_bool VALUE abap_false.
READ TABLE it_tab ASSIGNING FIELD-SYMBOL(<line>) INDEX 1.
DATA(lv_account_id) = <line>-recipient_id.
LOOP AT it_tab ASSIGNING FIELD-SYMBOL(<ls_line>).
IF lv_account_id <> <ls_line>-recipient_id.
lv_diff_found = abap_true.
EXIT.
ENDIF.
ENDLOOP.
IF lv_diff_found = abap_false.
ev_rec_id = lv_account_id.
ENDIF.

Approach3


the idea is similar as approach2, now instead of manual comparison inside each LOOP, we leverage "LOOP AT XXX WHERE condition".



READ TABLE it_tab ASSIGNING FIELD-SYMBOL(<line>) INDEX 1.
LOOP AT it_tab ASSIGNING FIELD-SYMBOL(<ls_line>) WHERE recipient_id <> <line>-recipient_id.
ENDLOOP.
IF sy-subrc <> 0.
ev_rec_id = <line>-recipient_id.
ENDIF.

In order to measure the performance, we construct two kinds of test case. In the first one, we generate the internal table with N rows, each has exactly the same recipient id. And for the second, each one has different. Both are extreme kinds of scenarios. We may consider to measure the case between these two, for example for a N rows table there are 50% table rows have the same id and another 50% have difference one.



Performance test result


The time spent is measured in microsecond.


N = 1000


For the first test case, approach3 is most efficient. For the second test case, approach2 is the winner, as we expected.



N = 10000



N = 100000



N = 1000000



N = 5000000



Based on the performance result, we do not consider approach1 any more. For the choice between approach2 and 3, we need to investigate on the distraction of recipient id in the real world.


Maybe you can also share if you have better solutions?

14 Comments