Document Information Extraction – Learning Journey & Practical example
Have you ever thought a computer can work for you ?
Usually, whereas the rise of the technology, numerous of us are still using
Ctrl + C
Ctrl + V
shortkeys scrolling text of portable document format to get the precious informations from our day to day documents.
I’ve decided to set up an open source project related to OpenSAP Certifications and Record of Achievement. In fact, I had an interesting business case about getting the main informations from certificates in order to feed a Human Capital Skills and Certificates.
From Open SAP we can download record of achievement, publish badges to linkedin profile, however, I was not able to find an accelerator to feed the HCM software.
I’ve started prototyping a simple List page with Cloud Application Programming (Capire).
Once I had the data model and the first MVP mockup. I thought about feeding automatically the database from the related documents.
Then, I’ve discovered the the “DOX” (Document Information Extraction) tutorials series thanks to Juliana Morais.
Set your environment up
First of all, you will have to fill the prerequistes from the following tutorial : create you Business Technology Platform personal account
In order to be able to use the SAP AI Business Service for Document Information Extraction, you will need an
access_token. The first step consist of activating the BTP Booster and set up a service instance. Then, there is a manual step to access the Swagger UI page with the access token
Discover and enrich data with API
Interested in extracting data from document to store it into my own data set, I was looking for an Employee data model to enrich with certificates details.
I’ve found in the next steps the Use Machine Learning to Enrich Employee Data with Swagger UI
Create custom schema configuration and document template
Following my learning journey, then, I’ve submitted a Pull Request to github with my own draft of tutorial for creating custom schema configuration and template before to discover the Mission : Shape Machine Learning to Process Your Own Standard Business Documents and Use Machine Learning to Process Business Documents
This will lead me to enhancing my open source project with an SAP AI Business Services flavor.
Stay tuned to this blog post in order to get a step by step tutorials to set up your own OpenSAP Certificates list report & API.