For different purposes , scanning ABAP codes in an SAP system is useful to gather different kind of information.
For example :
- To detect security vulnerabilities in ABAP level
- To detect hard coded values in ABAP codes
- To get a list of external RFC calls used in custom (Z) developments
- To get a list of database tables – fields used (before S4HANA transformation for example)
In my company Novaline, we’ve coded a tool named “ABAP Optimizer”,
which scans ABAP codes for performance vulnerabilities and then further modifying ABAP code automatically for performance optimization. ( See my last blog about it here )
In this blog I’m going to share some coding details about scanning ABAP codes.
Basic : Reading an ABAP source code
Basically we can read source code of an ABAP include with the ABAP command
Below is a simple report :
To get source code of this, we can use “READ REPORT” command as below :
READ REPORT command fills string internal table “gt_source” with source code of the report “ZTEST”.
By doing this, we only have source code as a pure string table. There is no interpretation about code.
Yet, just to code a simple ABAP scanner which only searches some specific texts in ABAP code, we can simply read a list of custom reports from SAP view TRDIR and read their ABAP codes by “READ REPORT” command one by one .. then finally we can make simple text searches in code.
SAP program “RS_ABAP_SOURCE_SCAN” is already doing this search , you can refer to it as an example. Check out this page to see some benefits of it.
So what about interpreting the code ? Let’s go in more details.
Interpreting ABAP Code
Long years ago, when I was first trying to code an SAP security scanner tool, I spent time on “SAP Code Inspector” tool and tried to understand how it analyses the ABAP code and make detections.
Below are some important concepts to know before we go further :
Tokenization : Means parsing an ABAP code from a pure string to meaningful structures. If you like to read theories as me then check out this page.
- Statement : Every ABAP command that we finish with a period
- Token : Every word in ABAP statement is a token .. doesn’t matter whether it’s an ABAP keyword, literal or else like a variable name
- Structure : Some statements are bound to each other .. for example an ABAP LOOP statement ends with an ENDLOOP statement somewhere in the code , so they both presents a structure
So imagine an ABAP code part as below :
And let’s put the concepts on it :
You can parse string to get these structures by yourself after getting code by READ REPORT command, or you can use existing classes in SAP code inspector to make it simpler. Check out standard class “CL_CI_SCAN” for this. ( It basically uses SCAN ABAP-SOURCE command under the hood. )
So by using this logic and information, how to code an ABAP interpreter ?
After tokenizing the code as above, second step should be analyzing the command or statements you are interested in. Interpretation depends on what you are trying to detect.
Let’s continue with an example scenario as below.
Basic Example :
Let’s code an ABAP scanner which detects SELECT commands used with “*” to read all the database table fields.
Steps should be like below :
- Tokenize the code
- Loop on all the statements and detect SELECT commands
- Parse SELECT commands and find the ones used with star “*”
Imagine an small ABAP program as below :
Second SELECT statement to read VBAP table is used with a star “*”, we are trying to detect these SELECT statements.
And let’s code the scanner it in ABAP :
( I’m sharing code as images to make it more understandable, but if you request I can also share code as text )
Tokenize the ABAP code :
On report code below , parameter “p_prog” will be an existing ABAP program name in system.
And let’s run it for the program “ZTEST” above and display the object “gr_scan” in debugger, “statements” and “tokens” tables are visible on this object :
Let’s display “tokens” table :
And display “statements” table noticing “from” and “to” fields :
“from” and “to” fields in “statements” table shows the index in “tokens” table for every statement.
Find SELECT statements and check if it uses “*” :
Basically “statements” table keeps every command in the selected ABAP program,
and we can access every token in a statement by using from / to fields on “tokens” table.
To decide whether a statement is a SELECT or not, we can just check the first token.
Also by checking second token we can see if SELECT is used by a star ( * ) .
Remaining part of the scanner code is below :
Basically it’s that simple to scan ABAP codes , but real hard part starts when you start coding an analyzer to detect different phenomenons. Example above is only scanning one type of command (SELECT) , and it’s not related to any other command in the code.
Let’s imagine a scanner which detects SELECT commands under LOOPs ,
and don’t forget that SELECTs can be under subroutine calls like PERFORM , METHOD call or a MACRO call.
To detect that complicated states, it requires more detailed classes , logic and off course test cases.
I’ll try to write a blog about modeling a complex class like that as well in my further blogs.
Thanks reading !
Novaline Information Tech.
( It’s a cross posted blog – For original source please click here )