How to identify Non-unicode characters in a Text file
Usually we encounter a scenario where a program goes for a dump due to conversion errors while using Open/Read Dataset to read .txt files lying on the Application server.For ex below is the screenshot of such a dump.If the text file is very large then it will be tough to identify the rows or columns having non Unicode characters or identifying if at all there are any non-unicode characters in the file.
Below are the steps to identify non-unicode Characters in a .txt file :-
- Open a blank notepad.
- Type the below given text in the notepad.
<?xml version="1.0"?> <test> </test>
- Copy the content of the .txt file on the Application Server in between the <test> and </test> in the notepad file that we had created and save it with .xml extension.
- To identify the Non Unicode characters we can use either Google Chrome or Mozilla firefox browser by just dragging and dropping the file to the browser.
- Chrome will show us only the row and column number of the .txt file where the non-unicode character is lying but it will not show the content of that particular row or column.
- Mozilla Firefox will show us the row and column number along with the content of that row and column.An underscore will be till the column where the non-unicode character is lying.If there are multiple non Unicode characters in the .txt file then we should remove the first non-unicode character that was identified and then repeat all the steps as explained here to identify the next non-unicode character.Tedious,but this way atleast we can identify the presence of non-unicode characters in the text file.
- Notepad screenshot going by the row and column number that we got using Mozilla Firefox.Status Bar option in the notepad will help us seeing the row and column number in the notepad file.
- Using Internet Explorer when we try to open the .txt file with non-unicode characters it will just show a blank page.So,we need either Chrome or Mozilla Firefox browser to identify the row and column with non-unicode characters.
- Attached are the text file and xml file which can be used to test by dragging and dropping in Chrome or Mozilla.