Skip to Content
Technical Articles
Author's profile photo Amaury VAN ESPEN

Document Information Extraction – Learning Journey & Practical example

Have you ever thought a computer can work for you ?

Usually, whereas the rise of the technology, numerous of us are still using

Ctrl + C

then

Ctrl + V

shortkeys scrolling text of portable document format to get the precious informations from our day to day documents.

Taking part into Devtoberfest from SAP, I’ve discovered lots of Missions, groups and tutorials thanks to Scavenger Hunt Mission.

I’ve decided to set up an open source project related to OpenSAP Certifications and Record of Achievement. In fact, I had an interesting business case about getting the main informations from certificates in order to feed a Human Capital Skills and Certificates.

From Open SAP we can download record of achievement, publish badges to linkedin profile, however, I was not able to find an accelerator to feed the HCM software.

I’ve started prototyping a simple List page with Cloud Application Programming (Capire).

Once I had the data model and the first MVP mockup. I thought about feeding automatically the database from the related documents.

Then, I’ve discovered the the “DOX” (Document Information Extraction) tutorials series thanks to Juliana Morais.

Set your environment up

First of all, you will have to fill the prerequistes from the following tutorial : create you Business Technology Platform personal account

In order to be able to use the SAP AI Business Service for Document Information Extraction, you will need an access_token. The first step consist of activating the BTP Booster and set up a service instance. Then, there is a manual step to access the Swagger UI page with the access token

Discover and enrich data with API

Interested in extracting data from document to store it into my own data set, I was looking for an Employee data model to enrich with certificates details.

I’ve found in the next steps the Use Machine Learning to Enrich Employee Data with Swagger UI

Create custom schema configuration and document template

Following my learning journey, then, I’ve submitted a Pull Request to github with my own draft of tutorial for creating custom schema configuration and template before to discover the Mission : Shape Machine Learning to Process Your Own Standard Business Documents and Use Machine Learning to Process Business Documents

This will lead me to enhancing my open source project with an SAP AI Business Services flavor.

Stay tuned to this blog post in order to get a step by step tutorials to set up your own OpenSAP Certificates list report & API.

Amaury

Assigned Tags

      1 Comment
      You must be Logged on to comment or reply to a post.
      Author's profile photo Witalij Rudnicki
      Witalij Rudnicki

      Hi Amaury Van Espen. Thank you for joining Devtoberfest, and great to see you sharing your story! Best regards. -Witalij