Skip to Content

Dear Colleagues,


Background:

This is probably not worth blogging, but I thought I should share this information anyhow with the community which has given me so much. There was a requirement recently on my project where an email report had to be sent out with a CSV atttachment. Now, as part of the data within the CSV file, there were some special characters like Spanish tilde characters (Ñ,Ó etc) in there. When you opened the csv in MS Excel, these characters looked weird. However, when you opened them in a text editor everything was fine. Even though it is not a huge issue, for the purpose of achieving enhanced customer satisfaction I thought I will give this challenge a go.

Solution:

The solution to this issue was easier than I thought. Since I was using mail package in my scenario, I started with changing the content type for the email and the attachments. And as I was using Java mapping, I was declaring each String within the program to be ISO 8859-1 encoded 😆 . Each of the combinations resulted in a more funnier email and weirder file.

Finally a bit of research pointed me to the fact that there was absolutely nothing wrong with the file or the email in the first place. This was just a feature of MS excel where it needs a CSV file to have a Byte Order Mark in the beginning for it to consider the file as a UTF-8 encoded one from the scratch.

So, here comes the solution. Just add a BOM at the beginning of the file 🙂

Code snippet below:- This code snippet would be helpful in any other case where you need to put in a BOM explicitly.


byte[] bom = new byte[] { (byte)0xEF, (byte)0xBB, (byte)0xBF };
String outputCSV = new String(bom,"UTF-8") + yourActualCSV;

Hope this helps some one in the community.

To report this post you need to login first.

2 Comments

You must be Logged on to comment or reply to a post.

  1. Antonio Sanz

    Thanks for your blog.

    I just want to add that mail adapter uses SOAP. According to W3C specification, SOAP xml codification just admit UTF8 or UTF16.

    There is no way to change it so the character encoding inside it must be UTF8o UTF16. Otherwise it will not work correctly.

    Regards.

    (0) 

Leave a Reply