Loading special characters from a flat file to target table
Scenario:
To load special characters such as non-English characters from a flat file(.txt) to target table whose code page is set to UTF-8 .
In order to load special characters from a flat file to target table we need to set appropriate code page or encoding in :
a) a) In the source file
We need to set the encoding of source file to utf-8. This differs based on the text editor we use. Below mentioned are the steps for changing the encoding in notepad.
File –> Save As . Change the encoding to UTF-8
b) b) Flat file properties.
In BODS, normally the code page is set to <default> in flat file properties as given below:
Change it to utf-8.
c) Datastore properties
Go to the target data store properties in BODS. Normally the code page is set to <default> as given below:
Change Code page and Server code page to utf-8.
Thank you so much for posting this ! you saved my day... !! 🙂
What if the underlying target (and / or source) code page for the DataStore is not utf-8 ?
It could be set to, for instance, iso-8859-1. The Data Services documentation specifies that the DataStore Codepage should be set to match the underlying Database / Adapter / Web Service code page. If they're different there is the potential to lose characters that are used in the superset (utf-8) but, not coded in the subset (iso-8859-1) - even with transcoding - not good!
In an ideal world every data source and target would be unicode but, in reality, that is just not always the case!
Simon,
Its not a problem. If the target database support UTF-8, this will work fine. Normally all databases support these codes. The ODBC settings also needs to be changed to the desired code page. Otherwise there could be data loss while inserting to the target.
Useful Tips
Just to add that if the target table is on SQL Server, there is one more setting to be considered:
change varchar to nvarchar datatype.
This will ensure that all other special characters / multi-byte / non-English characters will be handled
Comprehensive guide / example for handling multiple code pages:
http://wiki.sdn.sap.com/wiki/display/EIM/Data+Integrator+example
Useful Info
Nice work
That was a really useful hint:)...Thanks!!
you are right when you talk about flat file that it should be saved as utf-8 and in formats settings to be changed as utf-8 but as for DB are concerned, u dont need to change as they are mostly unicode.
Thanks for the post.
Useful info. Thank you very much...
regards
Mohan
It is very useful thanks.
Usefull document. Thanx for sharing...
Good information, thank you. But we run into a issue while handling Japanese characters. Source is Oracle (UTF8). And we set the code page all settings similar to the one you described above. Since two years our code is working fine processing Japanese characters. But we got a record which has Japanese characters with length 15000. Job is hanging or sometimes giving 'End of Communication channel'. If we trim the data to 14000, its working fine. We tried different NLS_LANG, code pages, still not working. Any suggestions please?
Useful info!
Great Post !!!! save my day 🙂