Let’s build an OCR (optical character recognition) app for Android with Cordova and Tesseract. With this we can leverage any SAPUI5 app with the OCR functionality.


/wp-content/uploads/2015/05/5_704422.jpg/wp-content/uploads/2015/05/6_704427.jpg

What is Tesseract?

According to its site, Tesseract is probably the most accurate open source OCR engine available and it can read a wide variety of image formats and convert them to text in 60 languages. For complete explanation, please visit https://code.google.com/p/tesseract-ocr/

What you need:

Build Tess-two

  • Create a folder “programs” under c:\
  • Download Tess-two to c:\programs\tess-two
  • Type the following commands;

    cd tess two

    ndk-build

    android update project –path c:\programs\tess-two –target android-22

  • Once it is done, check if you have the liblept.so and libtess.so under c:\programs\tess-two\libs.


Create Cordova Project

  • Under C:\ prompt create a Cordova project:
    cordova create c:\programs\OCR com.enterprisemobility.OCR OCR


  • Execute the following commands:

    cd OCR

    cordova platform add android




  • Download the language file from here and put in the asset folder under c:\programs\OCR\platforms\android\assets.
    /wp-content/uploads/2015/05/1_704236.jpg
  • Copy all files from c:\programs\tess-two\libs to C:\programs\OCR\platforms\android\libs

Modify Java Source Files


  • MainActivity.java (file location: c:\programs\OCR\platforms\android\src\com\enterprisemobility\OCR\MainActivity.java)
    Add the function to copy the language file from the assets folder to OCRFolder in storage device if it doesn’t exists.
    
    if (!(new File(DATA_PATH + "tessdata/" + lang + ".traineddata")).exists()) {
    try {
    AssetManager assetManager = getAssets();
    InputStream in = assetManager.open("tessdata/" + lang + ".traineddata");
    OutputStream out = new FileOutputStream(DATA_PATH
    + "tessdata/" + lang + ".traineddata");
    byte[] buf = new byte[1024];
    int len;
    while ((len = in.read(buf)) > 0) {
    out.write(buf, 0, len);
    }
    in.close();
    out.close();
    Log.v(TAG, "Copied " + lang + " traineddata");
    } catch (IOException e) {
    Log.e(TAG, "Unable to copy " + lang + " traineddata " + e.toString());
    }
    
    
    
    
    
    
    
    
    
    
    
    


  • TesseractPlugin.java (file location:c:\programs\OCR\platforms\android\scr\com\tesseract\phonegap\TesseractPlugin.java)
    onPhotoTaken() to convert a captured image to bitmap and perform the OCR:
    
    BitmapFactory.Options options = new BitmapFactory.Options();
    options.inSampleSize = 4;
    Bitmap bitmap = BitmapFactory.decodeFile(_path, options);
    ExifInterface exif = new ExifInterface(_path);
    int exifOrientation = exif.getAttributeInt(
    ExifInterface.TAG_ORIENTATION,
    ExifInterface.ORIENTATION_NORMAL);
    Log.v(TAG, "Orient: " + exifOrientation);
    int rotate = 0;
    switch (exifOrientation) {
    case ExifInterface.ORIENTATION_ROTATE_90:
    rotate = 90;
    break;
    case ExifInterface.ORIENTATION_ROTATE_180:
    rotate = 180;
    break;
    case ExifInterface.ORIENTATION_ROTATE_270:
    rotate = 270;
    break;
    }
    Log.v(TAG, "Rotation: " + rotate);
    if (rotate != 0) {
    // Getting width & height of the given image.
    int w = bitmap.getWidth();
    int h = bitmap.getHeight();
    // Setting pre rotate
    Matrix mtx = new Matrix();
    mtx.preRotate(rotate);
    // Rotating Bitmap
    bitmap = Bitmap.createBitmap(bitmap, 0, 0, w, h, mtx, false);
    }
    // Convert to ARGB_8888, required by tess
    bitmap = bitmap.copy(Bitmap.Config.ARGB_8888, true);
    
    
    
    
    
    
    
    
    
    
    
    
    


    Once we have the image in bitmap format, we can perform the OCR and get the result from the recognizedText:

    
    TessBaseAPI baseApi = new TessBaseAPI();
    baseApi.setDebug(true);
    baseApi.init(DATA_PATH, lang);
    baseApi.setImage(bitmap);
    String recognizedText = "";
    recognizedText = baseApi.getUTF8Text();
    baseApi.end();
    
    
    
    
    
    
    
    
    
    
    
    
    

Calling Tesseract from index.html

Call Tesseract plugin (callNativePlugin) with parameter image URI.


function callNativePlugin(imageURI) {
  var tesseractPlugin = cordova
  .require('com.tesseract.phonegap.tesseractPlugin.TesseractPlugin');
  tesseractPlugin.createEvent(imageURI, nativePluginResultHandler);
  }










If success, we get back the result from nativePluginResultHandler:callback and print the result in html page.


function nativePluginResultHandler(callback) {
  alert("Result: " + callback);
  var result = document.getElementById("result");
  result.innerHTML = callback;
  }










I have attached the modified java files and index.html.

Build Cordova Project

Run cordova build under C:\programs\OCR\platforms\android. If there is no error, you will get the debug apk for testing:

/wp-content/uploads/2015/05/3_704385.jpg

Reference:

http://gaut.am/making-an-ocr-android-app-using-tesseract/

           

To report this post you need to login first.

7 Comments

You must be Logged on to comment or reply to a post.

  1. kiseok yeom

    Hi! Ferry Gunawan

    This was very helpful through the post .

    But, when I build and run , pressing the ‘Capture Photo’ button does not operate.

    Why is this?

    (0) 
  2. Vijay Vignesh

    Hi ,

    I am getting the below error . I have installed all the required plugin but still i get this .

    07-01 17:22:07.591: E/Web Console(10219): Uncaught Error: Module com.tesseract.phonegap.tesseractPlugin.TesseractPlugin does not exist.:1406

    (0) 
  3. Seby K P

    Hi,

    I have been trying to compile the project in Eclipse, but, always getting error, Uncaught Error: Module com.tesseract.phonegap.tesseractPlugin.TesseractPlugin’.

    on trouble shooting, I can see an invalid charactor ] error in the tesseractPlugin.js file under the www in plugins folder.

    can you please help to sort this issue.

    we are using cordova latest.

    thanks

    (0) 
  4. Errol Green

    Hello Ferry,

    Thank you for making this guide, I only have one question for you and it seems like you ran into the same issue while trying to run your project.

    How did you end up resolving this error:

    JNI GetMethodID called with pending exception 'java.lang.NoSuchFieldError' thrown in void com.googlecode.tesseract.android.TessBaseAPI.native
      
    ClassInit():-2'

    I complied using android-ndk-r10e.

    Thanks!

    (0) 

Leave a Reply