Skip to Content
Author's profile photo Former Member

Loading special characters from a flat file to target table

Scenario:

     To load special characters such as non-English characters  from a flat file(.txt) to target table whose code page is set to UTF-8 .

In order to load special characters from a flat file to target table we need to set appropriate code page or encoding   in :

a)   a)    In the source file

We need to set the encoding of source file to utf-8. This differs based on the text editor we use. Below mentioned are the steps for changing the encoding in notepad.

File –> Save As . Change the encoding to UTF-8

/wp-content/uploads/2013/07/1_250912.png

b)    b)   Flat file properties.

    In BODS, normally the code page is set to <default> in flat file properties  as given below:

/wp-content/uploads/2013/07/2_250919.jpg

     Change it to utf-8.

     /wp-content/uploads/2013/07/3_250920.jpg

c) Datastore properties

    Go to the target data store properties in BODS. Normally the code page is set to <default> as given below:

/wp-content/uploads/2013/07/4_250921.jpg

    

      Change Code page and Server code page to utf-8.

   /wp-content/uploads/2013/07/5_250922.jpg

Assigned Tags

      16 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Former Member
      Former Member

      Thank you  so much for posting this ! you saved my day... !!  🙂

      Author's profile photo Former Member
      Former Member

      What if the underlying target (and / or source) code page for the DataStore is not utf-8 ?

      It could be set to, for instance, iso-8859-1. The Data Services documentation specifies that the DataStore Codepage should be set to match the underlying Database / Adapter / Web Service code page. If they're different there is the potential to lose characters that are used in the superset (utf-8) but, not coded in the subset (iso-8859-1) - even with transcoding - not good!

      In an ideal world every data source and target would be unicode but, in reality, that is just not always the case!

      Author's profile photo Former Member
      Former Member

      Simon,

      Its not a problem. If the target database support UTF-8, this will work fine. Normally all databases support these codes. The ODBC settings also needs to be changed to the desired code page. Otherwise there could be data loss while inserting to the target.

      Author's profile photo Former Member
      Former Member

      Useful Tips

      Author's profile photo Christian A Gonzaga
      Christian A Gonzaga

      Just to add that if the target table is on SQL Server, there is one more setting to be considered:

      change varchar to nvarchar datatype.

      This will ensure that all other special characters / multi-byte / non-English characters will be handled

      Author's profile photo Christian A Gonzaga
      Christian A Gonzaga

      Comprehensive guide / example for handling multiple code pages:

      http://wiki.sdn.sap.com/wiki/display/EIM/Data+Integrator+example

      Author's profile photo Former Member
      Former Member

      Useful Info

      Author's profile photo Former Member
      Former Member

      Nice work

      Author's profile photo Former Member
      Former Member

      That was a really useful hint:)...Thanks!!

      Author's profile photo Former Member
      Former Member

      you are right when you talk about flat file that it should be saved as utf-8 and in formats settings to be changed as utf-8 but as for DB are concerned, u dont need to change as they are mostly unicode.

      Thanks for the post.

      Author's profile photo mohan salla
      mohan salla

      Useful info. Thank you very much...

      regards

      Mohan

      Author's profile photo Faiz Ahmed
      Faiz Ahmed

      It is very useful thanks.

      Author's profile photo TANKA RAVICHANDRA
      TANKA RAVICHANDRA

      Usefull document. Thanx for sharing...

      Author's profile photo Former Member
      Former Member

      Good information, thank you. But we run into a issue while handling Japanese characters. Source is Oracle (UTF8). And we set the code page all settings similar to the one you described above. Since two years our code is working fine processing Japanese characters. But we got a record which has Japanese characters with length 15000. Job is hanging or sometimes giving 'End of Communication channel'. If we trim the data to 14000, its working fine. We tried different NLS_LANG, code pages, still not working. Any suggestions please?

      Author's profile photo Joice Samuel
      Joice Samuel

      Useful info!

      Author's profile photo Cristian Guerrero
      Cristian Guerrero

      Great Post !!!! save my day 🙂