Latest Advancements in Document Classification by SAP AI Business Services
Document Classification is part of the SAP AI Business Services portfolio, which encompasses a range of AI-enabled microservices on the SAP Business Technology Platform. Document Classification delivers one of the main capabilities for Business Document Processing, which I previously described in the blog “Simplify Business Document Processing with SAP AI Business Services”. This particular service enables companies, that are confronted with managing large numbers of business documents, to automatically classify documents based on custom classification categories.
This blog intends to give an update on the latest advancements in Document Classification. After reading, you will have an improved understanding of the purpose of this AI-enabled service, its capabilities, and commercials. Moreover, links to additional resources will enable you to get started with Document Classification on the SAP Business Technology Platform trial landscape.
What is Document Classification
Document Classification classifies business documents based on customized machine learning models. These models are trained based on a dataset of pre-classified (labeled) documents. The existence of the labeled dataset is a key prerequisite to train a custom model, as usual with custom ML projects. Document Classification takes advantage of selected NLP approaches: (1) Optical Character Recognition (OCR) to extract the text from documents and (2) selected text classification algorithms to train and classify. The service provides an automatic hyperparameter search which allows to select the best performing model. Once a model is available it can be used for serving (inference). For the inference, the service takes document image as input. During the inference the model assigns a corresponding class to the document and provides the probability. The figure below depicts a simplified process flow for training and using your own classification model with Document Classification.
Typical Use Case
A typical use case for the Document Classification service is called Enterprise Mail-Inbox. On a daily basis companies have to process business documents attached to emails from their business partners (e.g., suppliers or customers). Usually, the documents arrive in a central enterprise email inbox, from where every document needs to be manually opened, classified, and dispatched. Such a manual approach is inefficient, error-prone, and can cause damage to business-critical processes, such as order processing.
In this scenario Document Classification can add value by automatically classifying large volumes of documents into customer-specific document types (e.g., invoices, dunning letters, and sales orders). The automation of this small step increases productivity of organizations by minimizing repetitive tasks and manual labor. The outcomes of this step can be used in subsequent processes as exemplified in the next paragraph.
Customer Reference: Villeroy & Boch Group
Document Classification in conjunction with SAP Intelligent RPA can provide added value by automatically classifying and dispatching incoming documents. In this way, the entire process flow can be automated as follows (see also next figure):
First, an intelligent bot screens incoming emails for attachments. The bot then sends all attachments to the Document Classification service for automatic classification. After this pre-processing, the bot can finally dispatch the documents and initiate the subsequent business process steps.
Exactly this use case is in productive usage by our reference customer Villeroy & Boch Group. Our customer states that they achieve an average automation of 92% for this particular business scenario. You can get more detailed information on our customer and their scenario in the customer reference deck and the SAP News Center article: When Bots Decide: Process Automation at Villeroy & Boch.
Document Classification: what is new?
Support for new file formats
Our Document Classification service now supports additional file formats, including single-page PNG and JPEG format. This enables classification of more documents without the need for converting them to PDF prior to processing. Consequently, our customers can save valuable resources and work more efficiently.
New pre-trained model
We have developed a new pre-trained classification model for invoices, payment advice, and purchase orders which is now available (see figure below). As of now, the model supports the German and English language.
Check out this this demo to get an understanding of how the pre-trained model can be used to classify business documents:
New service plan
So far, Document Classification and all other commercialized SAP AI Business Services have been charged in blocks of 1000 documents per month. Our new service plan has a reduced block size of 100 documents. The minimum purchase quantity is two blocks. You can get a detailed overview of the pricing in the SAP Discovery Center. Document Classification is also available as a subscription model via SAP Store. For more information on the new service plans of SAP AI Business Services, please refer to this blog.
To find out more about our new pre-trained Document Classification model:
- Check out SAP Help: Pre-trained Classification Model
- Test the trial version available in SAP Discovery Center.
- Run through our tutorial on SAP Developer Community.
For more information on SAP AI Business Services:
- Explore: SAP Community Page
- Dive deeper: Open SAP Course & openSAP Podcast
- Get an overview: Blogpost part I | Blogpost part II
- Exchange Knowledge:
Document Classification Questions | Document Information Extraction Questions
Business Entity Recognition Questions | Service Ticket Intelligence Questions
Data Attribute Recommendation Questions | Invoice Object Recommendation Questions