Automate Invoice scanning – Serverless Computing
In continuation of my previous blog series – Serverless Extensions, this blog shows how you can accelerate innovation by extending your SAP solutions in a serverless way, using the broad set of cloud services provided by the hyperscaler – Amazon Web Services (AWS). It is simple to set up, easy to develop and you pay only for what you use. The scenario covered in this blog – automated invoice scanning – is common in any organisation. As you all know, we have hundreds of software solutions available in the market for extracting data from scanned invoices. I want to show you how easy it is to develop a similar solution based on cloud services. It took me under 3 hours to develop this prototype.
AWS does not need an introduction. It is one of the most comprehensive and broadly adopted cloud platforms with more than 175 services. The scenario uses three services from AWS along with IAM (Identity and Access Management) to manage access to AWS services and resources.
- Amazon S3 – Highly-scalable cloud object storage solution
- AWS Lambda – Lets you run code without provisioning or managing servers. You pay only for the compute time you consume.
- Amazon Textract – ‘Hero of the solution’ – Automatically extracts text and data from scanned documents. Amazon Textract goes beyond simple optical character recognition (OCR) to also identify the contents of fields in forms and information stored in tables.
Our hypothetical customer already has an SAP S4HANA solution implemented on-premise. The solution has been extended using SAP Cloud Application Programming model on SAP Cloud Platform. Purchase orders are raised in S4HANA system. Approved purchase orders are then sent to the suppliers. Suppliers send their invoices to the AP users via email which are then manually entered into the system. The customer is looking for a solution to scan these invoices automatically into the system.
A simple server-less solution (using AWS) for this scenario could look like this:
- The AP user uploads the invoice to the system using a simple UI5 application deployed in SCP. The application uploads the invoice to the Amazon S3 storage system using SAP Cloud Platform Open Connectors. [You can go a step further and automate this to scan the email attachments and send the invoices to Amazon S3].
- The uploaded invoice triggers an AWS Lambda function which handles the process of data extraction from invoice.
- AWS Lambda function calls Amazon Textract APIs to scan the invoice and extract the relevant data. The extracted invoice data is sent to the business application via the service exposed using SAP Cloud Application Programming model.
Note that this is just a skeletal solution which clearly shows how you can accelerate the innovation using cloud services without re-inventing the wheel. You need to add more security features to the architecture to make it a Minimal Viable Product.
If you want to take a peek at how I developed this to estimate the efforts required, continue reading.
Simple UI5 application to upload the invoice received from the supplier
Set up SCP Open Connector for Amazon S3 to connect to AWS. You will need to set up an AWS IAM user with proper authorisations and access key to set this up.
Set up AWS S3 buckets to receive the invoices. Processed invoices are moved to another bucket.
Create an AWS lambda function and set up the S3 trigger. This connects PUT operation of the ‘freshinvoiceskt’ S3 bucket to the Lambda function. Make sure the resource policy of AWS Lambda is updated properly to access S3 bucket.
This Lambda function receives the uploaded invoice from the trigger event and passes it on to AWS Textract API to extract the data from the invoice. It then sends the necessary details back to the service in SAP Cloud Platform to be added to the HANA database. The invoice data is displayed to the user in the scanned invoices application.
From here, the invoices can be processed manually or automatically depending on the business requirement.
The function also moves the invoice from the ‘freshinvoiceskt’ bucket to ‘processedinvoiceskt’ bucket.
I have taken a video to show how fast this solution can be. Python code used to extract the invoice data is available here
Thanks for sharing. We could have also done all of this in SAP Cloud Platform using Document Information Extraction (DoX) service.