First real use of secondary indexes on an internal table
Introduction
Given the reluctance of the general ABAP community to use new-fangled (that is, over fifteen years old) concepts like SORTED and HASHED tables, I was hesitant to write about something a bit newer, but then I thought – what the heck, perhaps some people will find it an encouragement to use new stuff!
And I know this isn’t that new!
So, we have HASHED tables, where the key is unique and the lookup time is constant for each record, and SORTED tables which mean we don’t need BINARY SEARCH any more (except if we need sort descending…). For these tables, there’s an index already defined to speed things up – but it’s like a database table with just a primary index. Secondary keys are like additional indexes on database tables – but for internal tables.
If you don’t know how SORTED and HASHED tables then read the documentation. It’s all quite comprehensive so no excuse!
I’ve heard it said that you should only use these if you’ve got tables with loads of information in. Well, so long as the data isn’t being handled in a loop, I think it doesn’t matter. If the data volume being processed is small, a few extra nano-seconds won’t matter, and data volumes grow – so there’s some future proofing in using the structures which are most efficient with large tables, right from the start.
Secondary keys
Here’s that syntax, to refresh your memory.
TYPES dtype { {TYPE tabkind OF [REF TO] type}
| {LIKE tabkind OF dobj} }
[tabkeys]
[INITIAL SIZE n].
And then tabkeys looks like this:
… [ WITH key ]
[ WITH secondary_key1 ] [ WITH secondary_key2 ] …
[ {WITH|WITHOUT} FURTHER SECONDARY KEYS ] … .
Additions
1. … WITH FURTHER SECONDARY KEYS
2. … WITHOUT FURTHER SECONDARY KEYS
Those additions, we’ll forget about. They’re for use when you’re defining generic table types.
Now, for my purposes, I’ve got a questionnaire, with pages on it, categories of questions and questions. And I need to access it in many ways. So here’s how I defined it:
TYPES:
questionnaire_ty TYPE SORTED TABLE OF q_entry_ty WITH NON-UNIQUE KEY page_number cat_seq
WITH NON-UNIQUE SORTED KEY by_question COMPONENTS q_id
WITH NON-UNIQUE SORTED KEY by_cat_guid COMPONENTS cat_guid q_seq
WITH NON-UNIQUE SORTED KEY by_cat_text COMPONENTS cat_text
WITH NON-UNIQUE SORTED KEY by_cat_seq COMPONENTS cat_seq .
The idea is that I can access an internal table of this type rapidly by page number, question id, category unique id (guid), category text and category sequence. Seems quite a lot, but the alternatives were to have a standard table and sort it and use binary search for each read, or not bother at all, and just put up with sequential reads.
Some problems
I’ve got the categories in my questionnaire in sequence order. So, naturally, I want to renumber them. The obvious way of doing this is
LOOP AT me->questionnaire ASSIGNING <entry> USING KEY by_cat_guid WHERE cat_guid EQ i_guid.
ADD 1 TO index.
<entry>–cat_seq = index.
ENDLOOP.
But there’s a problem there. It dumps. And it dumps because cat_seq is part of the key by_cat_guid!
So, I thought, I’ll delete the records, collect them and then insert them afterwards
LOOP AT me->questionnaire INTO entry USING KEY by_cat_guid WHERE cat_guid EQ i_guid.
DELETE TABLE me->questionnaire FROM entry.
entry–cat_seq = index.
INSERT entry INTO TABLE renumbered.
ENDLOOP.
INSERT LINES OF renumbered INTO TABLE me->questionnaire
But data was still going amiss. The problem was, that the delete command deletes the entry that matches the primary key. So it was reading one entry in the LOOP AT, and deleting an entirely different entry (that matched the primary key) at the DELETE.
I tried the old DELETE… INDEX, but that got me nowhere. But a quick check of the syntax for DELETE gave me the hint.
LOOP AT me->questionnaire INTO entry USING KEY by_cat_guid WHERE cat_guid EQ i_guid.
DELETE TABLE me->questionnaire FROM entry USING KEY by_cat_guid.
entry–cat_seq = index.
INSERT entry INTO TABLE renumbered.
ENDLOOP.
INSERT LINES OF renumbered USING KEY by_cat_guid INTO TABLE me->questionnaire
What to be aware of
With an internal table with additional keys, there are few things you really need to take care about.
1. You can’t change a field of an entry you’ve ASSIGNED to, if that field is part of one of the keys
2. If you access data using one key – you really need to change it using the same key.
3. All of the usual internal table handling statements have the addition USING KEY. Sometimes it’s vital – like with the DELETE example. Other times it’s a matter of performance. For the INSERT LINES I could have omitted the USING KEY, and it would still work – however it is not as efficient, since I know that all my renumbered entries have the same cat_guid.
Final words
When new ABAP commands become available, try to use them. In my application, it probably won’t make any difference. But what you don’t use, you forget. Surely there will come a time when you do need additional accesses to internal tables – if you’ve already practiced, the next time it won’t take as long.
Well the biggest problem i have is that i dont have these new ABAP commands available. As there are no business needs the system isnt updated - not in this area-
The other problem: i'm using hashed tables wherte i can, but there are still people not knowing them sorting tables back and forth all the time... So i'm a little bit exotic here 🙂
i'm always wondering why so many programers arent willing to learn using new commands. Since years we habe abap_bool, abap_true and abap_false, its mandantoy in the dev guide but its hard to catch the violation, still programers do their own stuff, and surprisingly, even their own stuff isnt consitently. Sometime they store 0 and 1 as flags, sometimes Y and N sometine X and space.
I dont think i will see secondary indexes in my environment before i'm retired 🙁
I'm fortunate in that regard - I just saw the other day that one of the systems I work in is at 7.4 already. So I just need to get some admin rights and install abap eclipse!
Most of the systems I work on are 7.3 or higher for this particular client.
So I just need to get some admin rights and install abap eclipse!
For eclipse you don't need admin rights. You just download the files, unzip them, install the ABAP addons and that's it. None of the step needs admin rights on the desktop.
I needed the admin rights to get the right level of Java runtime installed!
You're right, Java VM needs admin rights. Eclipse+plugins don't.
Hello Matt,
Nice to see someone blogged about secondary keys. I have been itching to use the them in productive code for sometime but ABAP Keyword Documentation always stopped me 🙁
I wish i could use secondary keys with the BAPI tables (Standard tables), but they give me a dump as well.
So although i'm aware of their existence i have never used them in productive code.
BR,
Suhas
It's the same advice some people give with e.g. HASHED tables -> unless there's thousands of rows don't bother, as the overhead for initialisation is too high.
But the way I look at it - if there's only 1000 entries, it's going to be quick anyway (notwithstanding processing within loops).
Well i guiess that's an urban legend with hash tables. There is a small overhead when there are only a few entries and theres no much gain with few entries, but who cares? If it is a table only being read from and the key is unique just make it a hashed table. Thinking about or even discussing this all the time wastes more resources as a hash table with only 50 entries.
Since I know of the existence of secondary indices, I often use them in BI. It avoids creating multiple itabs with basically the same data but different keys for quick access in routines.
nice.
M.
Hi Matt,
First of all, thank you for your blog, the only one which exists on that topic.
I was digging in the official Secondary Keys documentation about what are the "lazy and delayed updates" (i.e. I want to understand how it works in the kernel), and went to read your blog, looking for help as the topic is a little complex. I think there are a few typos (maybe I'm wrong), and I have also one question about the DELETE:
In your example, cat_seq is not part of the secondary key by_cat_guid, but part of the primary key. Thus, I think it's why it dumps : if I've understood correctly the documentation, you may update fields which are part of secondary keys.
I think it's a typo, it should be
entry
, instead of<entry>
. These 2 lines appear twice, in the second and third code blocks.According to the documentation, when you read the internal table using a secondary key, the SY-TABIX corresponds to the line in that secondary index, so I think using
DELETE me->questionnaire INDEX sy-tabix USING KEY by_cat_guid
should do the job. Moreover, I wonder whether we may still useDELETE me->questionnaire
, as we are inside LOOP AT? (but maybe you did it on purpose to demonstrate secondary keys).Here, I think it's the same topic as my first remark. I think it should be:
1. You can't change a field in an existing line of the internal table, if that field is part of the primary key
(I think the ASSIGN is simply the way you modify the line in the internal table (you might have used MODIFY which would lead to the same restriction), and according to the ABAP documentation, you may update fields which are part of secondary keys (as far as they don't belong to the primary key).Regards.
Sandra
Thanks to everybody's comment here, thank you Matthew for starting this. Yes, always a few developers really make use of new syntax offers in ABAP.
100 % agree with Rainer about urban legends warning to use anything useful. Sandra was the only one noticing that you must decide never ever again use a LOOP INTO just not to get confused with work areas and field symbols.
What ever the overhead of using hashed tables, secondary indexes or ASSIGNING field-symbols may be:
when tables are small, you won't notice (can't measure) a difference anyway, when tables get larger you are simply better and faster. Remove LOOP .. INTO from your portfolio, use LOOP .. ASSIGNING <field-symbol> and use it always regardless of the situation.
Why not start better in the first way?
BTW: I love the new && operator - no more need for CONCATENATE any more.
And: Not everything new is always better: REGEX is extremely slow though powerful compared to old-fashioned word with offsets and compare operators (CS, CA, ..). Here you may have good reasons to use old and outdated techniques.
Just my 2 cents.
Regards
Clemens
If ABAP in ECLIPSE is as slow and non-intuitive as the new debugger, then I'm happy not to have a system to try 🙂
You got some good arguments and I agree with your loop-statement 🙂
but what is that:
If ABAP in ECLIPSE might be as slow as your "new" debugger?
Are you on a run? 😕
~Florian
It's not. And the new debugger may be "strange" but it's much more powerful (and I come from a time when the new debugger didn't exist).
Nowadays I mainly use AiE, just use SE80 for some specific things that are missing.
Hi Clemens,
i don't think that too many abaper would use REGEX in a case, where CS or CA does the trick. It's not, that REGEX is the most convenient construct to use, so i hardly see the relevance of that comparison.
On the other Hand, you gain a lot for those cases, where CS or CA doesn't suffice.
And all of a sudden, you can do things that where plainly unthinkable before REGEX.
Besides REGEX isn't that new either.
About the same holds true for the new debugger. It widens your range of possibilities.
Even if it were slower (which i have failed to notice as of yet), the limitations and bulkiness of the old one are reason enough for the change.
Old or new is hardly ever the question.I'll use whatever works best.
But using new techniques in order to hone your skills, might pay off on the Long run.
To Keep in mind:
Those who are always trailing behind the herd, will always feed on droppings, not on verdancy.
Jörg
Hi all,
IMHO a big hurdle in adopting the new ABAP enhancements is availability.
As far as I know in order to use those enhancements you need to upgrade the whole system.
Is the business wiling to do that ? the cost of upgrade is not something to sneer at......
I wish we could just upgrade the ABAP stuff.... 🙁
As for myself I am on 740 sp08 🙂
Regards.
Correct and right.
And the answer is obviously "No, the business is not willing at all in throwing money on IT projects" 🙂
Usually business has to be forced by some major force to accept the upgrade.
All the times I worked on upgrade, the business was the force to fight off (not the upgrade issues themselves) and the projects were mandatory due global HQ decision.
just by the way: LOOP ASSIGNING <field-symbol> is available since release 4.5 (or earlier), more than 15 years. Still we are at 95 % LOOP INTO / MODIFY.
Regards Clemens
There's nothing wrong with a healthy LOOP AT INTO, as Long as you're not processing really biiig internal tables.
Especially when you need to manipulate all or some of the data for further processing, without riscing to compromise it.
We're always keen on protecting data from unintended modification, but willingly use field-symbols and just trust in the benevolence of other code-blocks?
Even when you're dealing with methods only, there's no garantee, that nobody messes with the data you provide. And with field-symbols it is "what's done is done and what's lost is lost".
I'd recommend the use of field-symbols wherever they can make your life easyer.
A true gain of performance you'll have, where you need to update rows and rows of data.
But looping at an internal table of a fews thousand entries, without the need or Intention to change anything within the table - the risc isn't worth the gain!
The problem i see with field-symbols, ist that about half of those using them, don't know them half as good as they should and the other half can't be bothered.
I'm as fierce a promoter of well used field-symbols as anyone - but i'm fighting a lost battle against all atttempts to install field-symbols for every purpose and in every context.
First understand - then use - and use wisely.
BR - Jörg
I must say I have never understood the "protecting data from unintended modification" argument.
Why is it different from any unintended modification of standard variable?
Regarding methods,
Import parameters cannot be changed (In contrary to IMPORTING parameters in old subroutines).
QED.
The difference is, when you change a field-symbol, assigned in a LOOP AT TABLE, you change the contents of the table, not just a work area.
BR - Jörg
And if you don't understand that go to the back of the class!
Rich
This is correct, from technical point-of-view.
However, you can also update "just a work area", which one row afterwards updates a DB table.
You should always understand the context of your code.
You should always understand the context of your code.
Well said.
You should always make your code so that subsequent developers can understand the context.