Skip to Content
Technical Articles
Author's profile photo Jose Muñoz Herrera

Extract Document details using Document Information Extraction service with ABAP

Dear community,

 

In this blog I want to show the posibility to extract information from a document using AI and OCR implemented by the BTP service Document Information Extraction calling the API offered from an ABAP program.

 

The objective of this blog is not to show how the API works as there are good blogs showing it ( Getting Started with Document Information Extraction Trial Service  or Developer Mission) but to show how you can automate the API calls with only ABAP program. I say only abap as there are already another integration scenarios in CPI ( Document Information Extraction Integration with Email Server ) or with iRPA but here we will see a simple solution.

Here the architecture:

The API calls you need to perform to send a file and receive the results are:

  1. Authenticate
  2. Send File
  3. Get job status, if job is still processing the document, wait until it’s done
  4. Get JSON with fields extracted

 

If you want to test this solution you have to create the Document Information Extraction Service Instance, plese follow this blog from Joni Liu

 

You need to create a destination in SM59 for the authentication:

Host: d7d51f5atrial.authentication.us10.hana.ondemand.com

Port: 443

User: <clientid from instance service key>

Pass: <clientsecret from instance service key>

 

 

And here you have the program that requires a pdf file, it will send the file requesting the fields documentNumber, purchaseOrderNumber and grossAmount and wait for the response. After getting the json it will write the values read by the service.

 

*&---------------------------------------------------------------------*
*& Report ZTEST_DOCUMENT_INFORMATION_EXT
*&---------------------------------------------------------------------*
*& PoC - Sends a file to Document Information Extraction BTP Service
*& Reads te file from Desktop and sends through API
*&---------------------------------------------------------------------*
REPORT ztest_document_information_ext.



CLASS zcl_die DEFINITION DEFERRED.

TYPES: BEGIN OF ty_filetab,
         value TYPE x,
       END OF ty_filetab.

DATA lr_die          TYPE REF TO zcl_die.
DATA: lv_file_name    TYPE string,
      lv_rc           TYPE i,
      lt_file         TYPE STANDARD TABLE OF ty_filetab,
      lv_file_content TYPE xstring,
      lt_filetable    TYPE filetable.



PARAMETERS: p_fname TYPE rlgrap-filename.



AT SELECTION-SCREEN ON VALUE-REQUEST FOR p_fname.

  CALL METHOD cl_gui_frontend_services=>file_open_dialog
    EXPORTING
      window_title = 'Choose a file'
      file_filter = 'PDF files (*.pdf)|*.pdf|'
    CHANGING
      file_table   = lt_filetable
      rc           = lv_rc.

  p_fname = lt_filetable[ 1 ]-filename.


**********************************************************************
* Document Information Extraction class definition
CLASS  zcl_die DEFINITION FINAL.

  PUBLIC SECTION.
    CONSTANTS: c_api_url  TYPE string VALUE 'https://aiservices-trial-dox.cfapps.us10.hana.ondemand.com',
               c_api_path TYPE string VALUE '/document-information-extraction/v1'.

    DATA:
      m_oauth           TYPE string,
      m_content_clients TYPE string.

    METHODS authenticate RETURNING VALUE(rv_authenticated) TYPE abap_bool..
    METHODS post_document IMPORTING iv_file_content           TYPE xstring
                          RETURNING VALUE(rv_job) TYPE string.
    METHODS send_file IMPORTING iv_file_content           TYPE xstring.
    METHODS get_status_job IMPORTING iv_job               TYPE string
                           RETURNING VALUE(rv_status_job) TYPE string.

ENDCLASS.


**********************************************************************
* Document Information Extraction class implementation
CLASS  zcl_die IMPLEMENTATION.

  METHOD authenticate.

    DATA lr_client         TYPE REF TO if_http_client.

    CALL METHOD cl_http_client=>create_by_destination
      EXPORTING
        destination              = 'ZBTP_DOC_INF_EXT_OAUTH2'
      IMPORTING
        client                   = lr_client
      EXCEPTIONS
        argument_not_found       = 1
        destination_not_found    = 2
        destination_no_authority = 3
        plugin_not_active        = 4
        internal_error           = 5
        OTHERS                   = 6.
    IF sy-subrc = 0.

*     If you have the class cl_oauth2_client in your system check note 3041322 or use following method
      lr_client->request->set_header_field( name  =  if_http_header_fields_sap=>request_method  value =  'POST' ).
      lr_client->request->set_header_field( name  = 'grant_type'  value =  'client_credentials' ).
      lr_client->request->set_header_field( name  =  if_http_header_fields_sap=>request_uri  value =  '/oauth/token?grant_type=client_credentials' ).
      lr_client->send( ).
      lr_client->receive( ).

      lr_client->response->get_status(
        IMPORTING
          code   = DATA(lv_code) ).

      IF lv_code = '200'.

        DATA: rest  TYPE string.

        DATA(l_content) = lr_client->response->get_cdata( ).
        SPLIT l_content AT '"access_token":"' INTO rest l_content.
        SPLIT l_content AT '"' INTO m_oauth rest.

        rv_authenticated = abap_true.

      ELSE.
        rv_authenticated = abap_false.
      ENDIF.

      lr_client->close(  ).

    ENDIF.

  ENDMETHOD.


  METHOD post_document.


    DATA lr_client         TYPE REF TO if_http_client.
    DATA lo_request_part         TYPE REF TO if_http_entity.
    DATA lo_request_part2         TYPE REF TO if_http_entity.
    DATA lv_content_disposition TYPE string.
    DATA len           TYPE i.
    DATA lv_options TYPE string.

    DATA: BEGIN OF ls_create_job_response,
            id            TYPE string,
            status        TYPE string,
            processedtime TYPE string,
          END OF ls_create_job_response.

    CLEAR rv_job.

    CALL METHOD cl_http_client=>create_by_url
      EXPORTING
        url                = c_api_url
      IMPORTING
        client             = lr_client
      EXCEPTIONS
        argument_not_found = 1
        plugin_not_active  = 2
        internal_error     = 3
        OTHERS             = 4.

    IF sy-subrc = 0.


      lr_client->request->set_header_field( name  =  if_http_header_fields_sap=>request_method  value =  if_http_request=>co_request_method_post ).
      lr_client->request->set_header_field( name  =  if_http_header_fields_sap=>request_uri  value =  |{ c_api_path }/document/jobs| ).
      lr_client->request->set_header_field( name  =  'Authorization'  value =  |Bearer { m_oauth }| ).
      lr_client->request->set_content_type( if_rest_media_type=>gc_multipart_form_data ).
      lr_client->request->if_http_entity~set_formfield_encoding( formfield_encoding = cl_http_request=>if_http_entity~co_encoding_raw ).

      lr_client->request->set_header_field( name = 'Accept' value = if_rest_media_type=>gc_appl_json ).


      lo_request_part2 = lr_client->request->add_multipart( ).


      lv_options = '{ "extraction": { "headerFields": [ "documentNumber", "purchaseOrderNumber", "grossAmount" ], "lineItemFields": [ "netAmount" ] },' &&
                   '"clientId": "default", "documentType": "invoice", "receivedDate": "2020-02-17", "enrichment": { "sender": { "top": 5, "type": ' &&
                   '"businessEntity", "subtype": "supplier" }, "employee": { "type": "employee" } }}'.
      lo_request_part2->set_header_field( name = `Content-Disposition` "#EC NOTEXT
                                         value = |form-data; name="options"; type=application/json| ).
      lo_request_part2->set_cdata(
        EXPORTING
          data   =  lv_options  ).




      lo_request_part = lr_client->request->add_multipart( ).
      lv_content_disposition = |form-data; name="file"; filename=sample-invoice.pdf |.
      lo_request_part->set_header_field( name = `Content-Disposition` "#EC NOTEXT
                                         value = lv_content_disposition ).
      lo_request_part->set_content_type( if_rest_media_type=>gc_appl_pdf ).

      len = xstrlen( iv_file_content ).

      lo_request_part->set_data( data = lv_file_content offset = 0 length = len ).

      lr_client->send( ).
      lr_client->receive( ).

      DATA(l_content_clients) = lr_client->response->get_cdata( ).
      /ui2/cl_json=>deserialize( EXPORTING json = l_content_clients pretty_name = /ui2/cl_json=>pretty_mode-camel_case CHANGING data = ls_create_job_response ).


      lr_client->response->get_status(
        IMPORTING
          code   = DATA(lv_code) ).

      IF lv_code = '201'.
        rv_job = ls_create_job_response-id.
      ENDIF.

      lr_client->close(  ).

    ENDIF.

  ENDMETHOD.



  METHOD get_status_job.


    DATA lr_client         TYPE REF TO if_http_client.
    DATA lv_status_job TYPE string.
    DATA l_json_response TYPE string.
    DATA: lr_data          TYPE REF TO data.

    CLEAR rv_status_job.

    CALL METHOD cl_http_client=>create_by_url
      EXPORTING
        url                = c_api_url
      IMPORTING
        client             = lr_client
      EXCEPTIONS
        argument_not_found = 1
        plugin_not_active  = 2
        internal_error     = 3
        OTHERS             = 4.

    IF sy-subrc = 0.


      lr_client->request->set_header_field( name  =  if_http_header_fields_sap=>request_method  value =  if_http_request=>co_request_method_get ).
      lr_client->request->set_header_field( name  =  if_http_header_fields_sap=>request_uri  value =  |{ c_api_path }/document/jobs/{ iv_job }| ).
      lr_client->request->set_header_field( name  =  'Authorization'  value =  |Bearer { m_oauth }| ).

      lr_client->send( ).
      lr_client->receive( ).

      l_json_response = lr_client->response->get_cdata( ).
      /ui2/cl_json=>deserialize( EXPORTING json = l_json_response pretty_name = /ui2/cl_json=>pretty_mode-camel_case CHANGING data = lr_data ).

      lr_client->response->get_status(
        IMPORTING
          code   = DATA(lv_code) ).

      IF lv_code = '200'.

        /ui2/cl_data_access=>create( ir_data = lr_data iv_component = `STATUS`)->value( IMPORTING ev_data = lv_status_job ).

        IF lv_status_job = 'DONE'.

          DATA: l_field_name      TYPE string,
                l_value           TYPE string,
                i TYPE i.
          i = 1.
          WHILE i < 4.

            /ui2/cl_data_access=>create( ir_data = lr_data iv_component = |EXTRACTION-HEADER_FIELDS[{ i }]-NAME| )->value( IMPORTING ev_data = l_field_name ).

            /ui2/cl_data_access=>create( ir_data = lr_data iv_component = |EXTRACTION-HEADER_FIELDS[{ i }]-VALUE| )->value( IMPORTING ev_data = l_value ).

            WRITE:/ l_field_name, l_value.

            i = i + 1.

          ENDWHILE.

          rv_status_job = lv_status_job.

        ENDIF.
      ELSE.
        rv_status_job = 'FAILED'.
      ENDIF.

      lr_client->close(  ).

    ENDIF.

  ENDMETHOD.

  METHOD send_file.

    DATA: l_job        TYPE string,
          l_status_job TYPE string.

    l_job = lr_die->post_document( iv_file_content ).
*    l_job = '1ad442aa-46dc-4e84-8344-d024ec516a18'.
    IF l_job IS NOT INITIAL.

      l_status_job = lr_die->get_status_job( iv_job = l_job ).

      WHILE l_status_job <> 'DONE' AND l_status_job <> 'FAILED'.
        WAIT UP TO 3 SECONDS.
        l_status_job = lr_die->get_status_job( iv_job = l_job ).
      ENDWHILE.

    ENDIF.

  ENDMETHOD.

ENDCLASS.


START-OF-SELECTION.


  IF p_fname IS NOT INITIAL.

*   Covert file to binary format
    CALL METHOD cl_gui_frontend_services=>gui_upload
      EXPORTING
        filename   = CONV #( p_fname )
        filetype   = 'BIN'
      IMPORTING
        filelength = DATA(lv_input_len)
      CHANGING
        data_tab   = lt_file.


*   convert file to XSTRING
    CALL FUNCTION 'SCMS_BINARY_TO_XSTRING'
      EXPORTING
        input_length = lv_input_len
      IMPORTING
        buffer       = lv_file_content
      TABLES
        binary_tab   = lt_file.



    lr_die = NEW zcl_die( ).

    IF lr_die->authenticate( ) = abap_true.

      lr_die->send_file( lv_file_content ).

    ENDIF.


  ENDIF.

 

For testing we can use the following invoice  from missions. If we run the program with that pdf, after some seconds you have the following output

 

 

We can verify in the Document Information Extraction UI that the extracted that is correct.

 

With that you can automate the process of scanning documents like invoices, check if it has purchase order number to match the infoice with purchase order, and many other options just in an ABAP program.

 

Best Regards

Jose Muñoz

Assigned Tags

      9 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Paul PINARD
      Paul PINARD

      Thanks for sharing Jose Muñoz Herrera! 👍

      Author's profile photo Dominik Lange
      Dominik Lange

      Hi Jose Muñoz Herrera

       

      I am getting an logon screen when the line "lr_client->receive( )." in method "post_document." will be executed. Do you have maybe an idea wich logon credentials are needed here?

      Connection check from e.g, SM59 is working fine. I am working in an S/4 HANA on Prem System.

      Thanks

      Best regards

      DL

      Author's profile photo Jose Muñoz Herrera
      Jose Muñoz Herrera
      Blog Post Author

      Hello,

      If you set clientID ( section uaa of service key ) on user and client secret ( section uaa of service key )  on password of the RFC destination it should work:

       

      Credentials

       

      Regards

      Credentials

      Author's profile photo Dominik Lange
      Dominik Lange

      Hello,

       

      Regarding basic authentication I did it like you described in your blog post and also in you post based on my question. Do I also need to switch on SSL usage ? In SM59 with active SSL usage I get HTTP code 200 and a logon screen back via the connection test..

      Thanks

      Best regards

      DL

      Author's profile photo Jose Muñoz Herrera
      Jose Muñoz Herrera
      Blog Post Author

      yes, go with ssl. Remember to install the certificate in STRUST transaction

      regards

      Author's profile photo Dominik Lange
      Dominik Lange

      Hello,

       

      with "certificate" you mean the SAP client certificate signed from official CA? Or do we need to import something from BTP / Document Service into SAP backend?

       

      Thanks

      Best regards

      DL

      Author's profile photo Dominik Lange
      Dominik Lange

      Hi Jose Muñoz Herrera,

       

      issue is now resolved. After setup new trial account in US area the business service can be consumed from backend.

      Thanks

      Best regards

      DL

      Author's profile photo Jose Muñoz Herrera
      Jose Muñoz Herrera
      Blog Post Author

      Great Dominik Lange !!!

      Author's profile photo dileep pala
      dileep pala

      Thanks Jose Muñoz Herrera

      I have a similar requirement to upload the documents on DOX. Followed the blog to build a Document Information Extraction API in ABAP class. I am trying to upload the document from ECC system i.e., 7.5 version.

      1st service - Token Generation works fine and generates the Token
      2nd service - To upload the PDF on to BTP Document Information Extraction service. I am getting 308 response code in POST_DOCUMENT method.

      I have created the RFC destination to call Token Generation API as detailed in the blog.

      Also, I have created RFC destination for Upload API endpoint. Thus using cl_http_client=>create_by_destination

      Could you please share your valuable insight on the error.