Skip to Content
Technical Articles

Document Information Extraction through Integration of SAP Intelligent RPA and Doc Parser

Hi all. In this blog post, I would like to share how we can Process the Documents and Extract Information through the integration of SAP Intelligent Robotic Process Automation and Doc Parser.

Doc Parser is a tool to process the Documents and Extract information from them seamlessly. It very user friendly and help us to obtain the extracted information through multiple possibilities. One of the ways is using API. You can learn more about Doc Parser. Click Here.

You can enroll for a Free User Account. Read the API Document to obtain various API for uploading the Documents and downloading the Extracted Data.

Steps involved in the Following Process:

  1. Open a Free Account and Create a new Parser.
  2. Obtain the API Key and PARSER ID.
  3. Create a new Project in the SAP Intelligent RPA Desktop Studio.
  4. Create a Workflow and Call the API’s .

STEP 1: Open a free Account and Create a New Parser

You can open a free User Account in Doc Parser here. Once you opened the account, It will guide you How to create a new Document Parser. Create a New Document Parser by using sample Documents.

STEP 2 : Obtain the API KEY and PARSER ID.

Once you finished with the above process, now you have to obtain your API Key and PARSER ID for Authentication. Goto your Parser – Click on the Integrations tab (1) on the Menu.

After you clicked on the Integrations tab, for the first time you have to confirm your Email to get the API Key. Once You have confirmed your account, Click on the HTTP REST API (2).

Here you can get your API KEY and PARSER ID. Copy them.

STEP 3:Create a new Project in the SAP Intelligent RPA Desktop Studio.

Initially create 2 folder in your System.

Folder 1 – Unprocessed Doc folder which contains the Documents to be processed.

Folder 2 – Extracted Data Folder in Which we will Store the Data retrieved from the DOC PARSER in the form of JSON Files.

Now Go to your Desktop Studio. Create a New Project.

Create the Following Context in order to Process and Store the Various Data.

Context Variables.

InvFile – Array Contains the Collection of Files present in the Unprocessed Doc Folder.

InvFileNames – Will loop through the Files in InvFile and Store its name

InvArrayLen – Length of the InvFile array.

Doc_Id – It stores the Doc_id of the Document uploaded which is required to retrieve the data.

DocIdArray – Since I am uploading multiple document , this array stores all the Doc_id.

DocIdArrLen – Length of the Doc Id Array

OutputData – Stores the Extracted Data which will be later written to json file and Stored.

API_Key – Contains the API KEY of DOC PARSER

Parser_Id – Contains the Parser ID.

Note: I am Here created context with respect to my Invoice Processing Project. You can name the Context as per your Convenience.

 

STEP 4: Create a Workflow and Call the API’s .

Now Go to your Workflow. Create a Workflow with the following Custom activity and a Delay.

Build the workflow. Go to Script Tab. Open your Generated Script.

Initialize your Data in the Global part. Assign the API KEY and PARSER ID to the respective Context variables.

API Key and PARSER ID are which you obtained earlier from doc parser.

Custom Activity 1 : Upload File to Doc parser.

Go to your Code of First Activity – Upload File to Doc Parser.

Now we are going to call an AJAX call to upload multiple documents to Doc parser.

First We are going to get all the files present in the folder using Get File Collection Activity and store.

Find the Array length and store.

Use a For Loop to upload the files to Doc Parser one by one from the folder.

API for Uploading Documents :  https://api.docparser.com/v1/document/upload/PARSER ID

Header :  api_key : Your API KEY

Form Data :   file : file path+file name

                       name : file 

Once you upload the File to Doc Parser, it will return a Document Id which is necessary to retrieve the Data from the Doc parser. Store this in the Context. Since we uploading multiple files I am using an ARRAY to store the Doc id’s.

You can find the code below for the Respective section

var InvPath = "C://..../UnProcessed_Invoice";
	rootData.InvFileCollection.InvFile = ctx.fso.folder.getFileCollection(InvPath,true);
	rootData.InvFileCollection.InvArrayLen = rootData.InvFileCollection.InvFile.length;
	
	for (var i = 0; i<=rootData.InvFileCollection.InvArrayLen-1;i++){							                      // For LOOP for calling Multiple AJAX CALL		
		rootData.InvFileCollection.InvFileNames = rootData.InvFileCollection.InvFile[i].Name;             // Getting Individual Filename		
		ctx.log("The File Now getting upload is "+rootData.InvFileCollection.InvFileNames);	
		// AJAX CALL for Uploading files to Doc Parser
		ctx.ajax.call({
			url : "https://api.docparser.com/v1/document/upload/"+rootData.Doc_Parser.Parser_Id,
			formData: {
				file: "C://.../UnProcessed_Invoice/"+rootData.InvFileCollection.InvFileNames,
				name: 'file'		
			},
			header : {
				api_key : rootData.Doc_Parser.Api_Key			
			},
			async : false,
			method: e.ajax.method.post,
	    success: function(res, status, xhr) {
				rootData.InvFileCollection.Doc_Id = res['id'];
				ctx.log("The Doc Id of the file Uploaded is "+rootData.InvFileCollection.Doc_Id);
				return;
			},
			error: function(xhr, error, statusText) {
				ctx.log(' ctx.ajax.call  error: ' + statusText);
			}
		});	
		rootData.InvFileCollection.DocIdArray[i]=rootData.InvFileCollection.Doc_Id;	                 // Storing the Doc ID in An ARRAY
	}
	sc.endStep();

Delay activity :

I have set a Delay for 20 Sec. So that the Doc Parser will process the Data.

Custom Activity 2 : Retrieve Extracted Data

Go to the “Retrieve Extracted Data ” code section.

Here we will call another AJAX call to retrieve the Data from the Doc Parser and we will store the Data as JSON File into the Extracted Data Folder using Write File Activity .

you can find the Code for Data Retrieval below.

// Retreiving Extracted Data
	rootData.InvFileCollection.DocIdArrLen = rootData.InvFileCollection.DocIdArray.length;
	for (var j=0;j<=rootData.InvFileCollection.DocIdArrLen-1;j++ ){
		ctx.log("The File data being retrieved is "+rootData.InvFileCollection.DocIdArray[j]);
		// AJAX CALL to Retrieve DATA
		ctx.ajax.call({
			url: "https://api.docparser.com/v1/results/"+rootData.Doc_Parser.Parser_Id+"/"+rootData.InvFileCollection.DocIdArray[j],
			async : false,
			method: e.ajax.method.get,
			header: {
				api_key : rootData.Doc_Parser.Api_Key	
			},
			contentType: e.ajax.content.json,
			success: function(res, status, xhr) {
				rootData.InvFileCollection.OutputData = res[0];
				ctx.log(rootData.InvFileCollection.OutputData);
				return;
			},
			error: function(xhr, error, statusText) {
				ctx.log(' ctx.ajax.call  error: ' + statusText);
			}
		});		
		var txt = ctx.serialize(rootData.InvFileCollection.OutputData, false, false, "\t");
		ctx.fso.file.write("C:/.../Extracted Data/"+rootData.InvFileCollection.DocIdArray[j]+".json", txt, e.file.encoding.UTF8);		
	}
	sc.endStep(); // End_scenario_1
	return;

Once you made the changes you can build and run the Code.

Note:

  1. Please make sure you have properly set the DOC PARSER Account and Obtained the API key and Parser ID correctly.
  2. Be Sure the Path of Folders are Correct and the Documents are present in the Folder.
  3. Please Check whether you are Calling the Correct API in Uploading and Downloading.

Once you run the Project, you can find the Extracted Data folder with Json files containing data of the Uploaded Documents as below

Conclusion:

The Data will be stored in the Json File which can be later used in any scenarios. This can be used in scenarios such as Invoice, PO to Sale Order,etc where we need to process different types of documents and extract data. I hope this will help you to gain some more ideas in SAP intelligent RPA.

Thanks for going through the blog post. Please feel free to drop comments.

References:

  1. https://blogs.sap.com/2019/11/19/sap-intelligent-rpa-enablement-and-getting-started/
  2. https://blogs.sap.com/2020/03/24/an-overview-of-components-for-sap-intelligent-robotic-process-automation/

Be the first to leave a comment
You must be Logged on to comment or reply to a post.