Approach1

JerryWang · ‎03-22-2014

Approach1

Approach2

Approach3

Performance test result

The requirement is: there is an internal table with a large number of table row.

If all rows have the identical recipient_id, that id( 30273 ) must be returned.

UUID	Phone_number	Recipient_id
0412ASFDSFDSFXCVS	138XXXXX1	30273
0412ASFDSFDSFXCVD	138XXXXX2	30273
0412ASFDSFDSFXCVF	138XXXXX3	30273
…	…	30273

If not, it must return empty.

UUID	Phone_number	Recipient_id
0412ASFDSFDSFXCVS	138XXXXX1	30273
0412ASFDSFDSFXCVD	138XXXXX2	30273
0412ASFDSFDSFXCVF	138XXXXX3	30273
…	…	30272

The table line type structure in the project looks like below:

Three different solutions have been made.

Approach1

the idea is a temporary table lt_sms_status is used to hold all the content of the internal table to be checked, and then SORT on the temporary table and delete adjacent entries. If all the table row have the same recipient id, after the operation there must be only one entry left.

DATA: lt_sms_status LIKE it_tab.

lt_sms_status = it_tab.

SORT lt_sms_status BY recipient_id.

DELETE ADJACENT DUPLICATES FROM lt_sms_status COMPARING recipient_id.

IF lines( lt_sms_status ) = 1.

  READ TABLE it_tab ASSIGNING FIELD-SYMBOL(<line>) INDEX 1.

  ev_rec_id = <line>-recipient_id.

ENDIF.

The drawback of approach1 is it could lead to the unnecessary high memory assumption. when lt_sms_status = it_tab is executed, no new memory allocation will not occur, until the write operation on the copied content. This behavior is documented as "Delayed Copy".

We also have concern regarding the performance of SORT and DELETE keyword when they are executed on a big internal table.

Approach2

Now we fetch the recipient id of the first row, and compare it with the left rows in the table. If most of the table rows have different recipient id, the execution has the chance to quit early. However if unfortunately all the table rows have exactly the same recipient id, this approach has to loop until last table row.

DATA: lv_diff_found TYPE abap_bool VALUE abap_false.

READ TABLE it_tab ASSIGNING FIELD-SYMBOL(<line>) INDEX 1.

DATA(lv_account_id) = <line>-recipient_id.

LOOP AT it_tab ASSIGNING FIELD-SYMBOL(<ls_line>).

  IF lv_account_id <> <ls_line>-recipient_id.

     lv_diff_found = abap_true.

     EXIT.

  ENDIF.

ENDLOOP.

IF lv_diff_found = abap_false.

  ev_rec_id = lv_account_id.

ENDIF.

Approach3

the idea is similar as approach2, now instead of manual comparison inside each LOOP, we leverage "LOOP AT XXX WHERE condition".

READ TABLE it_tab ASSIGNING FIELD-SYMBOL(<line>) INDEX 1.

LOOP AT it_tab ASSIGNING FIELD-SYMBOL(<ls_line>) WHERE recipient_id <> <line>-recipient_id.

ENDLOOP.

IF sy-subrc <> 0.

   ev_rec_id = <line>-recipient_id.

ENDIF.

In order to measure the performance, we construct two kinds of test case. In the first one, we generate the internal table with N rows, each has exactly the same recipient id. And for the second, each one has different. Both are extreme kinds of scenarios. We may consider to measure the case between these two, for example for a N rows table there are 50% table rows have the same id and another 50% have difference one.

Performance test result

The time spent is measured in microsecond.

N = 1000

For the first test case, approach3 is most efficient. For the second test case, approach2 is the winner, as we expected.

N = 10000

N = 100000

N = 1000000

N = 5000000

Based on the performance result, we do not consider approach1 any more. For the choice between approach2 and 3, we need to investigate on the distraction of recipient id in the real world.

Maybe you can also share if you have better solutions?

Seek the most efficient way to detect whether there are table row with duplicate key

Approach1

Approach2

Approach3

Performance test result

Get Started with the ABAP Development Tools for SAP NetWeaver

Become an ABAP in Eclipse Feature Explorer and earn the Explorer Badge

Six kinds of debugging tips to find the source code where the message is raised