Understanding CSV files and their handling in ABAP
In many ABAP developments, we use CSV files and sometimes there is confusion regarding CSV itself. Is it just a text file with values separated by commas ?
Let’s look at semantics of the various components involved so we have a vocabulary to work with.
Separator: Demarcates between two fields – so it will be a comma ‘,’ for CSVs.
Delimiter: It signifies limits of things, where it begins and ends. e.g. “Test String” has two delimiters, both double quote characters. Many CSVs can have double quotes as delimiters when comma values are to be placed as text.
Terminator : Indicates end of sequence. For CSV, we can think of newline as the terminator.
So if we have confusion about whether commas or double quotes are allowed inside data, looking at the CSV specification:
The de facto standard for CSV is here in case you want to read the full standard.
Definition of the CSV Format ( points 1 – 7 are from CSV standard )
1. Each record is located on a separate line, delimited by a line break.
2. The last record in the file may or may not have an ending line break.
3. There maybe an optional header line appearing as the first line of the file with the same format as normal record lines.
4. Within the header and each record, there may be one or more fields, separated by commas. Each line should contain the same
number of fields throughout the file. Spaces are considered part of a field and should not be ignored. The last field in the
record must not be followed by a comma.
5. Each field may or may not be enclosed in double quotes (however some programs, such as Microsoft Excel, do not use double quotes
at all). If fields are not enclosed with double quotes, then double quotes may not appear inside the fields.
6. Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes.
7. If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with
another double quote.
In my experience, point 7 is where we get tripped the most. CSV stands as comma separated values leading to the impression that commas are the separator and given that excel doesn’t put commas, it can start to get confusing.
So looking at some examples
10, Vikas , Sydney
Data with separator / delimiter inside them.
“11”, “Vikas”, “Sydney, AU” <– Data containing comma
“12”, “Vikas”, “Sydney, “NSW” AU” <– Data containing comma and quotes in data
Handling in ABAP:
I’m focusing on reading the files as that’s where we face issues. The file can be uploaded from user’s desktop or read from the application server.
1) Write your own code:
This can be easiest to start with but can start to get complicated with time.
Get data as a string, split at comma.
split lv_data at ‘,’ into lw_struct-test1 lw_struct-test2 lw_struct-test3.
a) This won’t work if we have data with separator, terminator or delimiter ( so no commas, double-quotes or newline within data ).
b) The code will need to be updated if the file format changes – say we need to add another field test4. The code then changes to :
split lv_data at ‘,’ into lw_struct-test1 lw_struct-test2 lw_struct-test3 lw_struct-test4.
2) Read the file using KCD_CSV_FILE_TO_INTERN_CONVERT
CALL FUNCTION ‘KCD_CSV_FILE_TO_INTERN_CONVERT’
i_filename = ‘C:\Temp\Upload.csv’
i_separator = ‘,’
e_intern = gt_intern
upload_csv = 1
upload_filetype = 2.
a) The file can be read only from presentation server/ desktop.
b) If a CSV file exists with double quotes, the last field is left with double quotes.
c) In case the file is to be read from application server, we need to read the code inside this FM and write some custom logic.
3) Use RTTI and dynamic programming along with FM RSDS_CONVERT_CSV .
It works but has lots of code . You can have a look at the code in this GIST.
In summary the steps are :
– Get structre of destination table using RTTI
– Create field catalog
– Create a dynamic table for field catalog values
– Create dynamic table lines
– Process Raw CSV data
– Store CSV files into dynamic table
a) Relatively long code leading to results especially if you have to program it from scratch.
a) Code is free from the target table format. If a new field is to be added, just update the structure for table type z_data_tty
4) Use class CL_RSDA_CSV_CONVERTER .
So the call becomes very straight forward – Instantiate the class with the separator and delimiter values. For a normal CSV, leave them as default .
* Instantiate the CSV object
call method cl_rsda_csv_converter=>create
* i_delimiter = C_DEFAULT_DELIMITER
* i_separator = C_DEFAULT_SEPARATOR
r_r_conv = lo_csv
* Process records
loop at lt_upload_data into lv_data.
CALL METHOD lo_csv->csv_to_structure
i_data = lv_data
e_s_data = lw_struct.
That’s It !
a) The code is very small – less chances of us making an error compared to the one in 3) above.
b) The code is decoupled with file structure – we get to keep the benefit from the above point .
c) It can be used for both application server / presentation server files – of course file reading will need to be done before the call.
d) The developer has documented the examples exhaustively in method CSV_TO_STRUCTURE . Big thank to him/her !
e) It’s part of package RSDA which is present in ABAP trial environments as well such as NSP .
If you feel lazy to type the full program, here is the whole source code:
Thanks for sharing.
For converting CSV to internal table I often use TEXT_CONVERT_CSV_TO_SAP which is also pretty straight forward (raw data in, structured data out).
Jan - I tried using TEXT_CONVERT_CSV_TO_SAP but it doesn't seem to work .
Look at this GIST:
For me it works. Maybe Your CSV file has a header line? In that case you need to set i_line_header = 'X'.
Nice blog detailing the comparison between different approaches to reading CSV files in ABAP.
I just completed a requirement using option 3, but will definitely try out option 4 next time I have a similar requirement.
However, my approach to option 3 was not as lengthy. There was no need to create field catalogs and dynamic table. I just created a data reference LIKE LINE of the output table. Below are snippets of my code.
What happens if your output table has a date field or an amount field? Are the values converted automatically?
If I remember correctly, it did handle date and amount fields correctly. Unfortunately, I've moved to a different project and no longer have access to that system to verify.
I do think it is handled automatically by the ABAP engine in line 54 of my screenshot above.
ABAP automatically performs type conversion when the assigning values from one field to another. So, line_content (which is CLIKE) will go through conversion to whatever the current type for <ddic_column> is, i.e. date type, amount type.
Do give it a try and let me know how it goes.
1) Parts of this blog as copied from the linked standard for CSV. You must make it clear which parts are being quoted. (Or I'll have to reject the blog, and I actually don't want to do that).
2) While CSV has a standard definition, this is often not followed. For example, from Excel, as CSV file may well be semi-colon (;) separated. You should edit your blog to reflect this. Also, whether it affects any of the solutions you've given.
3) End of line markers may be Line Feed, but may also be Carriage Return, Line Feed. It depends on the source system and the program used to generate the file.
Disadvange (error) of method 4 - if needs structure contain field 'string' typed and CSV containts double quoted text - exception ocurs. 🙁
Parsing CSV - this is not a problem just for ABAP language. Hence, rather than just thinking of an ABAP solution, think out of the box. Have a look at this post: http://www.kimgentes.com/worshiptech-web-tools-page/2008/10/14/regex-pattern-for-parsing-csv-files-with-embedded-commas-dou.html
This is the easiest method I have found till now. Just use SPLIT command, with the REGEX from this blog. I have used it several successfully.
it really help with a complete information
Excellent briefing, Vikas, and thank you for sharing!