Skip to Content
Product Information
Author's profile photo Maria Goepfert

Document Information Extraction: New extraction model for invoice documents

We are happy to announce a new model of Document Information Extraction for invoice-extraction!

If you missed the recent blog post on the new model for purchase orders, read it here!

Document Information Extraction

SAP AI Business services offer Document Information Extraction as part of its portfolio. The service is available through the Cloud Platform Enterprise Agreement (CPEA) and also in the Pay-As-You-Go (PAYGO) model. Using this service, users are able to extract information from various types of business documents, for example, invoices, purchase orders, and payment advice using pre-trained AI models.

After introducing the new model for purchase orders, Document Information Extraction now offers the new model for the invoices as well. Read more about the new model for purchase orders in our recent blog here and find out about the background of charmer models!

Better Model for Extraction of Invoice Documents

This blog presents the new improved pre-trained model for Invoice extraction. The new model allows the users to have increased robustness and have much better extraction accuracy for almost all header fields (for example: PO numbers, tax IDs, address types separation). The consistency on line-item extraction is also improved, yielding better results on long tables.

What is New?

The new model has some significant improvements for the header fields. For example the totalAmount and invoiceNo as central entities on an invoice show significant improvements.

On the test data, the new model makes 25% fewer errors for the total amount compared to the old model, and for the invoice number almost every second error, that the previous model used to made, is eliminated.

The vendor tax ID and bank account numbers also improved significantly and have a new cleaning so that the enrichment step can better identify the sender of the invoice (=vendor).

The new post-processing for amounts improves the results especially for customers in markets such as Germany, Spain, or Korea – as it features a better identification of decimal separators and thousand separators. When facing amounts like “1.000” (1 or 1000?) the model now analyses holistically the other amounts on the document, takes into account additional information like currency or country and is nonetheless robust to allow for minor inconsistencies that can appear on real-world documents.

In the following example, the post-processing correctly identified the dot as decimal separator for all numbers and still handled the (inconsistent) decimal separator in the total amount correctly:

Last but not least, the new post-processing logic is now also active for dates with an improved detection of the day/month order in ambiguous dates (e.g., 01/04/2023 vs. 04/01/2023) and will help our customers on non-US documents. As for the amounts, the model now analyses all other dates and evaluates side info like currency or country to better solve ambiguous cases.

How does it help?

Previously, the Chargrid model with a vision-based approach operating on pixel information was the main workhorse to process business documents.

It is now replaced with our new, transformer-based Charmer model to unlock a new level of extraction accuracy. In addition, the new model gives more credible confidence scores for its predictions and even has a reduced resource footprint.

As usual, our customers using the Document Information Extraction service embedded in SAP Central Invoice Management, SAP Concur Invoice or SAP Business One will automatically benefit from the new model’s higher accuracy in the form of higher automation rates and less manual corrections.

Do you have any questions left on this subject? Put them in the comments!

Follow the Tag Document Information Extraction to never miss out on the newest updates from Document Information Extraction!

 


Learn more

Read more about the news of Document Information Extraction on the help portal!

What is Document Information Extraction?

Document Information Extraction is one of the SAP AI Business Services on the SAP Business Technology Platform (SAP BTP). This ML-enabled service is available through the Cloud Platform Enterprise Agreement (CPEA) and also in the Pay-As-You-Go (PAYGO) model.

Tutorials & Learnings:

Blog posts:

SAP Community Page:

Assigned Tags

      1 Comment
      You must be Logged on to comment or reply to a post.
      Author's profile photo Peter Munt
      Peter Munt

      Hi

      We've just installed in DEV the SAP Central Invoice Management (SAP CIM) and it has by default the OCR Document Information Extraction by SAP AI Business Service and an optional custom OCR and information extraction service.   

      Are we meant to incorporate what you have mentioned here into the SAP CIM and so how can SAP CIM benefit from this - are there any specific guides on this for SAP CIM?