How to avoid wrong splitting of .CSV file into internal table using MATCH OFFSET, FIND
While working on the .CSV (comma separated file) on application server.
As the file is a comma separated file, we split the file at every comma in to our internal table.
The file can contain numbers as well as some text/description field e.g.
But sometimes text or description contain ‘,’ in it as part of description and it could lead wrong splitting of file E.g.
In such cases we can use MATCH OFFSET, FIND to replace the comma from description field with some other character so that it will not lead to wrong splitting of file.
Consider our .CSV file on application server contains below data and we have to split the file into internal table at every comma.
TYPES : BEGIN OF ly_file,
line TYPE string,
END OF ly_file.
DATA : lv_file TYPE text200, “Will contain file name
lw_file TYPE ly_file, “ work area to get records from file
lv_line TYPE string, “string which will use to get the record
lt_result_tab TYPE match_result_tab, ”contain occurrences of “
lv_off1 TYPE i, “ will store offset
lv_index TYPE sy-index,
lv_len TYPE i.
FIELD-SYMBOLS <match> LIKE LINE OF lt_result_tab.
FIELD-SYMBOLS <match1> LIKE LINE OF lt_result_tab.
*Get the file location and name in lv_file.
Lv_file = location of file to read
*Read the file from Application serve
OPEN DATASET lv_file FOR INPUT IN TEXT MODE ENCODING DEFAULT.
IF sy-subrc = 0. DO.
READ DATASET lv_file INTO lw_file-line.
IF sy-subrc = 0.
* Take the file record in a string lv_line.
lv_line = lw_file-line.
* Find the text/description fields i.e search for double quotes which contains text/description field- ”abc,defg”.
FIND ALL OCCURRENCES OF ‘“’ IN lv_line RESULTS lt_result_tab.
IF lt_result_tab IS NOT INITIAL.
* Result tab gives offset or location of every “ available in the
*string .In our case 000000,00000,00000,”abc,defg”,”hij,klm”,”kljudyhs”,….
*The first “ is at index 17 and the second is at *index 26, which forms *the first text field. The third “ starts at *index 27 and fourth at *index 35, which forms the second text/description field.
*Now we have to find the first and second offset of “ to get the first *text field. Then third and fourth for second text filed and so on
LOOP AT lt_result_tab ASSIGNING <match>.
CLEAR: lv_off1, lv_len.
* This check will avoid reading the same offset again.
IF lv_index NE sy-tabix.
* Get the first offset of “
lv_off1 = <match>-offset.
* Increase index to read the next offset
lv_index = sy-tabix + 1. * Read next offset using lv_index
READ TABLE lt_result_tab ASSIGNING <match1> INDEX lv_index. IF sy-subrc = 0.* Get the second offset and then calculate the length of the text using * both the offsets. The second offset will be always greater value so *by subtracting the first offset from second offset, we will get the *length of text.
lv_len = <match1>-offset – lv_off1.
ENDIF.* Replace occurrences comma ‘,’ with dot ‘.’ in the string from the *first offset to the length of string
REPLACE ALL OCCURRENCES OF ‘,’
IN SECTION OFFSET lv_off1 LENGTH lv_len OF lv_line WITH ‘.’.
* Now split the lv_line into your internal table at every comma
SPLIT lv_line AT ‘,’
INTO your work area