Streamlining Data Extraction from PDFs: The Power of Document Information Extraction (DoX) and SAP Build Process Automation
Managing the extraction and organization of data from numerous PDF files can be a laborious task, particularly when dealing with a large volume of documents. This process is prone to errors, which can have far-reaching consequences.
In this blog post, we will explore the integration of SAP Build Process Automation (SPA) with Document Information Extraction, offering a solution for automating the handling of PDF documents.
For those unfamiliar with Document Information Extraction, let me provide a brief overview. Document Information Extraction (referred to as DOX) is a valuable service designed to handle documents containing structured content, such as headers and tables. This service proves particularly useful for extracting data from documents like invoices or payment records. By leveraging this service, users can effortlessly upload a PDF document and receive the extracted data in the form of a JSON object.
Let’s Get Started
- Navigate to SAP Build Process Automation Lobby and create a new automation.
- Click on the Create dropdown and select Document Template
- Click on Create a New Template
- Give a name to the template and select the relevant invoice template from your system.
- Choose from the existing schema or create a new one. I’ll proceed with creating a new custom schema.
- Give a relevant name to the schema and add the required Header and Line Item Fields
- Click on next and proceed with annotating the uploaded document.
- Proceed with the relevant field mapping and click on save to save the template.
- Now let’s create an automation to pick up documents from our local system and give the extracted information as the output.
- Drag and Drop Extract Data (Template) from the Automation toolbar, then double-click to open its settings and click on Add Document Template
- Click on Choose a template from the current project and select the template we created earlier.
- Now scroll down to input parameters of Extract Data (Template) and add the document path.
- Save the automation and do a test run.
This configuration offers endless possibilities. You can seamlessly connect your automation to source files from your Outlook inbox or local file system, processing them in a continuous loop. Furthermore, you can harness the extracted JSON responses to automate tasks such as entering data into Excel or generating SAP S/4HANA invoices, unlocking a wide range of automation opportunities.
In conclusion, the integration of SAP Build Process Automation (SPA) with Document Information Extraction (DOX) provides an effective solution for automating the extraction and organization of data from PDF documents, especially when dealing with a large volume of files. This integration streamlines a laborious and error-prone process.