Document Information Extraction: New AI model for the extraction of purchase order documents
We are happy to share that Document Information Extraction now offers a new pre-trained model for purchase order extraction. Read this blogpost and learn about what it is all about and what value it brings to you!
Document Information Extraction
With the intelligent capabilities of Document Information Extraction (which is part of the SAP AI Business Services Portfolio), information can be easily extracted from business documents using pre-trained AI models. The pre-trained models can extract all the relevant information from standard documents (such as invoices, purchase orders, payment advice, and business cards).
The screenshot below gives an idea of the extraction, when using Document Information Extraction:
Better Model for Processing Purchase Order Documents
We are happy to share that Document Information Extraction now offers a new pre-trained model for purchase order extractions. The new model allows the users to have significantly better extraction results for more complex line-items, which are a typical part on purchase orders (for example multi-line material descriptions, complex table structures, a high number of line items, nested table information, etc.).
While the previous extraction model chargrid was using a vision- and pixel-based approach for the extraction, the new charmer extraction model is based on a transformer architecture. It operates directly on the result of the OCR extraction, exploiting both, the recognized text and the location of the text on the document. This new approach ensures a precise classification of texts and amounts on business documents.
In the first tests, the charmer model has demonstrated an increased extraction accuracy for almost all fields, notably for date fields, and amounts and auxiliary fields that are important to match business partners, such as the sender bank account details, in the enrichment step.
How does it help?
Real-world documents exhibit a variety of features that complicate the extraction of information. There are frequent inconsistencies in the formatting of numbers and dates, ambiguous custom labels or abbreviations, and complex table layouts with nested information. The new charmer model can handle a lot of these cases. In addition, it provides a better holistic handling of ambiguous dates and inconsistently formatted amounts. Furthermore, it can take much better non-trivial table layouts, featuring descriptions with varying lengths and line breaks, stacked cells within line items, and other nested information. The screenshot illustrates some of the mentioned challenges that can be found in real-world documents:
Using this new version of the extraction model, an improvement in the extraction results of several fields of the document can be expected, as well as many edge cases that could previously not be handled. The model shows much more consistent results for tables with many rows and gives higher and more credible confidence scores for extraction, thereby creating more value for the users of Document Information Extraction.
The new charmer model does not only provide great extraction accuracies, but also a lower resource footprint. It will be the base for future innovations for document processing due to its high reuse potential and extensibility.
The new version is already available to all customers consuming the Document Information Extraction service directly and all customers using SAP solutions built on top of the Document Information Extraction service such as the “Create Sales Order – Automatic Extraction” app in SAP S/4HANA order management.
Stay tuned and follow the tag #DocumentInformationExtraction for further updates!
Read more about the news of Document Information Extraction on the help portal!
What is Document Information Extraction?
Document Information Extraction is one of the SAP AI Business Services on the SAP Business Technology Platform (SAP BTP). This ML-enabled service is available through the Cloud Platform Enterprise Agreement (CPEA) and also in the Pay-As-You-Go (PAYGO) model.
Tutorials & Learnings
- Simplify Business Document Processing with SAP AI Business Services
- Free tier option for Document Information Extraction