Many Lines Spoil The Function
Current topic in my working life is refactoring. I try to find methods and patterns on how to design programs better just from the start. And I try to find ways to solve general refactoring issues.
I was searching for sample code that was not mine. But where to start?
The idea was that bad code will be long. Some people say that good functions only have some few lines of code but we all know that there are exceptions from this. If you have to fill a BAPI structure you will necessarily have a lot of code. However if there are hundreds of lines in a method or a function or a sub routine you can be sure that this code badly needs some refactoring…
To find these classes and reports I wrote a litte report.
Identify forms and methods with a large number of commands
And even within this little tool I faced a lot of situations where I thought: “do it the easy way or do it right…?”
I optimized the report in a way that it should be understandable. Although I see a lot of improvements.
- I use two variants for getting the components. One for the class and one for reports. This could be optimized by using the strategy pattern
- the output is quick and dirty. I just display the include name, the name of the function, the function type (METHOD, FORM, MODULE) and the numbers of statements.It would be nice to have the report/ class name and then the FORM or METHOD
- The program could be adapted to clearly describe
- classes
- reports
- function groups
- The analysis could be optimized
- by distinguishing between class methods and test methods
- by showing local classes in a report or class
- it is not possible to jump directly to the mehtod/ form/ module by double clicking a line
- …
What does it do?
the report selects PROG and CLAS from TADIR. Object name and package must be given to prevent long run.
The report then analyzes the selected includes (+ embedded includes) and displays them in CL_SALV_TABLE.
I added a sort on the column “number of commands” to have the ones with largest number of lines on top. I also added a filter to only show the functions above selected threshold.
It’s late.
Here is the report:
https://github.com/tricktresor/blog
REPORT.
CLASS lcl_rinfo DEFINITION.
PUBLIC SECTION.
METHODS analyze_prog IMPORTING object TYPE clike.
METHODS analyze_clas IMPORTING object TYPE clike.
METHODS display IMPORTING max type i..
METHODS constructor.
PROTECTED SECTION.
TYPES: BEGIN OF ts_data,
repid TYPE syrepid,
type TYPE c LENGTH 10,
name TYPE c LENGTH 80,
len TYPE i,
END OF ts_data,
tt_data TYPE STANDARD TABLE OF ts_data WITH EMPTY KEY.
METHODS get_prog_includes IMPORTING reportname TYPE clike RETURNING VALUE(includes) TYPE programt.
METHODS get_clas_includes IMPORTING classname TYPE clike RETURNING VALUE(includes) TYPE seoincl_t.
METHODS analyze IMPORTING object TYPE clike.
DATA keywords TYPE STANDARD TABLE OF char30.
DATA mt_data TYPE tt_data.
ENDCLASS.
DATA h_name TYPE trdir-name.
DATA h_devc TYPE devclass.
SELECT-OPTIONS s_name FOR h_name OBLIGATORY.
SELECT-OPTIONS s_devc FOR h_devc OBLIGATORY.
parameters p_max type i DEFAULT 50.
START-OF-SELECTION.
DATA(rinfo) = NEW lcl_rinfo( ).
SELECT object AS obj_type, obj_name
FROM tadir
INTO TABLE @data(objects)
WHERE obj_name IN @s_name
AND devclass IN @s_devc
AND pgmid = 'R3TR'
AND object IN ( 'PROG','CLAS' ).
LOOP AT objects INTO DATA(object).
CASE object-obj_type.
WHEN 'PROG'.
rinfo->analyze_prog( object-obj_name ).
WHEN 'CLAS'.
rinfo->analyze_clas( object-obj_name ).
ENDCASE.
ENDLOOP.
rinfo->display( p_max ).
CLASS lcl_rinfo IMPLEMENTATION.
METHOD constructor.
keywords = VALUE #(
( 'FORM' ) ( 'ENDFORM' )
( 'METHOD' ) ( 'ENDMETHOD' )
( 'MODULE' ) ( 'ENDMODULE' )
).
ENDMETHOD.
METHOD get_prog_includes.
DATA include_names TYPE programt.
DATA include_name TYPE program.
CALL FUNCTION 'RS_GET_ALL_INCLUDES'
EXPORTING
program = CONV syrepid( reportname )
TABLES
includetab = include_names
EXCEPTIONS
OTHERS = 3.
IF sy-subrc = 0.
LOOP AT include_names INTO include_name.
analyze( include_name ).
ENDLOOP.
ENDIF.
ENDMETHOD.
METHOD get_clas_includes.
includes = cl_oo_classname_service=>get_all_class_includes( CONV #( classname ) ).
ENDMETHOD.
METHOD analyze_prog.
rinfo->analyze( object ).
LOOP AT get_prog_includes( object ) INTO DATA(prog_include).
analyze( prog_include ).
ENDLOOP.
ENDMETHOD.
METHOD analyze_clas.
rinfo->analyze( object ).
LOOP AT get_clas_includes( object ) INTO DATA(clas_include).
analyze( clas_include ).
ENDLOOP.
ENDMETHOD.
METHOD analyze.
DATA source TYPE STANDARD TABLE OF string.
DATA tokens TYPE STANDARD TABLE OF stokes.
DATA statements TYPE STANDARD TABLE OF sstmnt.
DATA message TYPE string.
READ REPORT object INTO source.
DATA ls_data TYPE ts_data.
SCAN ABAP-SOURCE source
TOKENS INTO tokens
MESSAGE INTO message
KEYWORDS FROM keywords
STATEMENTS INTO statements.
*break-point.
LOOP AT statements INTO DATA(statement).
DATA(command) = tokens[ statement-from ]-str.
CASE command.
WHEN 'FORM'
OR 'METHOD'
OR 'MODULE'.
CLEAR ls_data.
ls_data-repid = object.
ls_data-type = command.
ls_data-name = tokens[ statement-from + 1 ]-str.
ls_data-len = statement-number.
WHEN OTHERS.
ls_data-len = statement-number - ls_data-len.
APPEND ls_data TO mt_data.
ENDCASE.
ENDLOOP.
ENDMETHOD.
METHOD display.
TRY.
cl_salv_table=>factory(
IMPORTING
r_salv_table = DATA(o_table)
CHANGING
t_table = mt_data
).
o_table->get_functions( )->set_all( ).
o_table->get_sorts( )->add_sort(
columnname = 'LEN'
position = 1
sequence = if_salv_c_sort=>sort_down ).
o_table->get_filters( )->add_filter( columnname = 'LEN'
sign = 'I'
option = 'GE'
low = conv #( max ) ).
o_table->display( ).
CATCH cx_salv_error.
ENDTRY.
ENDMETHOD.
ENDCLASS.
Hi Enno, nice Blog and thanks for the tool. I think it is quite useful.
In general I agree that long function or methods are an indicator of bad code. But I think Abap is some kind of special compared to other languages like C', Java, Scala, erc..., where Objects are inlcuded there more "naturally" into the language itself.
Within the official programming guidelines there is a very good rule of thumb
Modularize rather than atomize
Best regards, Tapio
Thanks for your comment, Tapio!
I agree that ABAP is different here. Thanks for the link! I just read a blog where the author recommended that a unit should have at least FOUR lines of code. That's definitively not constructive in SAP environment.
When testing the report I found units with 2000 lines of code and more.
That could be a slight indicator for having not even modularized code... 😉
SAP recommends to not have more than 150 commands per unit:
https://help.sap.com/doc/abapdocu_752_index_htm/7.52/en-US/abenproc_volume_guidl.htm
Hi Enno,
you may also use the ABAP Test Cockpit (ATC) / Code Inspector to check the lines of code within a module:
Thanks Uwe, Didn't know that...
Hello Enno,
nice idea!
I would propose/try to create a histogram from the data (x -> number of command y-> number of routines ).
Note: get_proc_includes( ) does not return anything, the loop is processed in the method.
best regards,
JNN
Thanks for your suggestion! I already said, there is a lot of potential... 😉
The diagram is only useful if the program/ Class will be clearly extracted so you can see the units of an object. It's not that easy because a report can have several local classes and classes can also have local classes plus test classes.
You're right: In case that there are includes in an include, the method get_prog_includes will not get the complete report!
I should use analyze_prog again:
Hm, I think the observable behavior of the program is already correct, as you are doing the LOOP inside the analyze_prog( ) routine already. I think it should just be
I would use the histogram not so much to detect deviation as to recognize the frequency distribution.
JNN
Hi Enno,
You can also get all these stats from the custom code analyser (tx /SDF/CD_CCA), in particular the Code Metrics tool. I also like to use the coverage analyser, because it works at class/report level and gives you the ability to drill down into different components and provides an execution profile.
Personally I find the branch count more useful than lines of code, as it is an indicator of the overall complexity of a component. As you already point out, pure size is not a reliable indicator. An If / else block counts as two branches regardless of the amount of statements each contain, to me this is much more relevant.
Thanks for your input Mike!
I will check that out.
Branch count is an interesting indicator! I think you're right that this is more important than length.
I like to do this code analyzing by myself from time to time. That's one reason why i wrote this program.