Using data quality microservice to validate address data in Cloud Platform Integration
In this blog, I am going to share how I leverage Data Quality Management, Microservices for Location Data (DQM Microservices) on SAP Cloud Platform to validate address data in my Cloud Platform integration flow.
Scenario
There is a requirement came out recently that my integration flow has to check address data in the incoming transaction data is a valid combination or not before writing to target system. More specific for the requirement is there are three address fields in the inbound data, they are state, suburb, and postcode (country code is hardcoded in my scenario, Australia). So, I need to find a way to check given postcode is the correct one to the given suburb in the given state. It means I need to find an service on the Internet to provide me address lookup or query. Many solutions on the Internet such as Google map RESTful API or GeoNames.org (http://www.geonames.org/) or even service from local post service provider (Australia Post). In the end, I found and chose DQM Microservices in my integration flow because it matches my needs quite well.
Data Quality Management, Microservices for Location Data
Before I show how I invoke DQM microservice in my integration flow, here are some explanation about this microservice and how I prepare my query address format .
The DQM microservices application offers cloud-based microservices for address cleansing, geocoding, and reverse geocoding. You can embed address cleansing and enrichment services within any business process or application so that you can quickly reap the value of complete and accurate address data.
This address cleansing solution uses reference data from global postal authorities for 240+ countries and territories to validate or correct addresses. With address cleansing, you can quickly and easily correct, parse, standardize, validate, and enhance address data.
Data Quality Management, Microservices for Location Data (DQM Microservices)
Enable DQM Microservices
To be able to use this service, we need to go to services in your SAP Cloud Platform Cockpit and enable it.
Open the service and we click “Configure Service” link to open configuration settings in another page.
Creating custom configuration
Inside this page, SAP has built 9 configuration templates for us for reference and test in Configurations tile. Usage information tile provides summary information for API per country.
In trial environment, number of service calls is limited to 1000 transactions per country or 30 days of usage. Tracking starts with the first transaction that applies to your account.
When you open Configurations tile, you will see 9 query address formats provided by SAP. You can create your own by using its mapping design UI and get query format very quickly. Overall, it shouldn’t take too long to get into the design UI and build a simple one like below.
Preparing and testing custom query address format
In the “Test Configuration” dialog, it provides example JSON request for us to test it from your REST API client tool. So, you can just simply copy the structure from the dialog and past & run in the test tool. Another option that SAP provides is testing the API from SAP API Hub, then you even don’t need a REST client program.
For the URL to receive POST call can be found in the “Application URL” link in the landing page of the service.
Authentication for calling the API
In trial environment, you can use basic authentication, which means you can just enter your S-ID and password from client tool to call the APIs. But, it is not supported in production environment. What you can use are OAuth 2.0 and client certificate authentications. Please check here for detail information.
My custom query address format
In my case, I only need to make sure that state, suburb, and postcode are match to each other. So, this falling into “Address Format 3: Location Without Delivery Address” this category.
Use this combination of fields when your purpose is to cleanse city (locality), region, postcode, or country data, without sending an actual street level address.
Include locality, region, postcode, and country data. Do not include any mixed fields.
This format is limited in functionality due to such limited information being sent in the request. However, city and region names can be standardized and formatted as requested, and in some cases corrected when sent with a misspelling. When sending a city, the region may be appended when there is only one region that has a city by that name. When sending a city, the postcode may be appended when there is only one postcode for the city, but if there are multiple valid postcodes for the city then no postcode is appended. When sending a postcode, the city and region may be appended when there is only one city for that postcode. When sending only a country name, the other variations of the country may be appended; however the appended country name is returned in English and the address settings do not affect its format.
Based on above description (text highlight in blue, green, and red color), I understood the relationship between input and output values when you only give city (locality) or city (locality) + state (region) or city (locality) + state (region) + postcode. In my scenario, I will provide all three values and compare them with the return result from the service. This will be explain in the later section.
Below is the query address format for my requirement. I have four input fields (country, region, locality, and postcode) and ten output fields. First four output fields are the return result from address cleanse (locality mapping to std_addr_locality_full, region mapping to std_addr_region_full, postcode mapping to std_addr_postcode_full, country mapping to std_addr_country_name), the rest of six fields are flag fields to provide more insight about the address.
{
"addressInput": {
"country": "",
"region": "",
"locality": "",
"postcode": ""
},
"outputFields": [
"std_addr_locality_full",
"std_addr_region_full",
"std_addr_postcode_full",
"std_addr_country_name",
"addr_asmt_info",
"addr_asmt_type",
"addr_asmt_level",
"addr_change_sig",
"addr_info_code",
"addr_info_code_msg"
]
}
- addr_asmt_info: Information on the validity of the address.
- addr_asmt_type: Type of address.
- addr_asmt_level: Level that the cleansing process was able to match the address to reference data. The assignment level varies from country to country, and may be different when country-specific reference data is used than when it is not used. The codes represent the following levels, in order of best to poorest.
- addr_change_sig: Indicates the significance of changes made to the address.
- addr_info_code: Generated only when an address is invalid or something suspect is identified by the cleansing process.
- addr_info_code_msg: Description for the addr_info_code.
Possible values for each field can be found at here.
Below are two test cases for correct and wrong address data.
In this case, input address data are correct combination. So, same values appear in the return fields.
For wrong data, it can be very tricky. In the below test case, region is not the right one to the other two fields and we can see the result from the service, region is corrected back to “VIC”. It’s based on the rule highlight in the above red text. Wrong data could happen in either suburb or postcode or all of them. In my scenario, it’s relatively simple. I need to check whether addr_info_code_msg field should be empty plus comparing values in three fields expect country in the input and output are identical.
Integrating DQM microservice in CPI integration flow
Until this point, we have sorted out how to validate address and just like invoking other REST services outside of SCP in CPI. I use a content modifier step to prepare input data and following with a request-reply step via HTTP channel to call DQM microservice, and in the end retrieve return result and compare to produce output. Here is my integration flow.
A test SOAP message is sent to the flow, in content modifier step I retrieve these values based on their XPath in the XML, put them in the properties and use them in the message body.
Below is the configuration setup in HTTP communication channel to call DQM microservice.
Below is the groovy script in the step after receive result from the service. I use JsonSlurper to convert received result from String to JSON format for accessing fields and their values easier later. I also retrieve original input values that I define/put in the first step, then few IF statements to compare input values and return values to tell caller that address data is correct or not.
import com.sap.gateway.ip.core.customdev.util.Message;
import java.util.HashMap;
import groovy.json.*;
def Message processData(Message message) {
def messageLog = messageLogFactory.getMessageLog(message);
def body = message.getBody(java.lang.String) as String;
messageLog.setStringProperty("Body:", body);
def json = new JsonSlurper().parseText(body);
def state = message.getProperty("State");
def suburb = message.getProperty("Suburb");
def postcode = message.getProperty("Postcode");
if(!state || !suburb || !postcode || state.trim().equalsIgnoreCase(json.std_addr_region_full) == false || suburb.trim().equalsIgnoreCase(json.std_addr_locality_full) == false || postcode.trim().equalsIgnoreCase(json.std_addr_postcode_full) == false || json.addr_info_code_msg){
if(json.addr_info_code || ){
body = "{\"checkResult\": \""+json.addr_info_code_msg+"\", \"checkCode\": \""+json.addr_info_code + "\"}";
}else{
body = "{\"checkResult\": \"State, Suburb, and Postcode are not match.\"}";
}
}else{
body = "{\"checkResult\": \"State, Suburb, and Postcode are match.\"}";
}
body = "{\"InOperationResponse\":" + body + "}";
message.setBody(body);
return message;
}
In below test case, postcode is wrong and flow detects and return with right message.
Another test case, city is wrong and flow detects and return with message coming back from DQM microservice.
Summary
DQM microservice can do more than the function I use in my scenario. It can be also used in an interactive program to help user entering correct address quicker or validate address via its API like my integration flow. Thanks and enjoy reading.