Skip to Content

Today the world has become viral on social media by posting on shopping recommendations, reviews, updates new trends, by liking /sharing posts … Enterprises can get hold of this information to get insight on the market sentiment of the product, their customer interest, trend and many. To achieve this one needs to combine the user identity from various social sources and link with user identity from the internal system so that we have single version of user entity which can be used for analysis.

Overall this refers to the processes, policies and concepts used to gather and compile social media data sources – like Facebook, LinkedIn and Twitter – and internal sources – like CRM – into one master data storage.

Below diagram will give you a better view on this.

Social MDM OverView.png

We have customer information from various sources like Social Media (Facebook, Twitter, and Google Plus) and internal systems (CRM). System collects data from these sources, harmonizing them via various data processes and then bringing them into one version of truth for consumption.

This involves collecting massive data set from the social media and combining with internal data by complex cleansing, matching and analytical operations. SAP HANA comes with many capabilities which can ease this complexity.  It has the capability to analyze the unstructured data, standardize & cleanse, match and build extremely permanent analytic models. Data Quality Functionality like Standardize & Cleanse and Match is available as AFL libraries now. The interface to these modules may be changing in the future SAP HANA release. If you are more comfortable with SAP Data Serveries- then you shall leverage Data Quality Transformations available in SAP Data Services. Here, we explore on how this complex process can be achieved using the power of SAP HANA.

Let us look at one scenario like Business and Leisure Hotel chain company XYZ Corporation would like to offer customized package for their customers. Today they don’t have an option to know about their customer interests unless they ask them explicitly their information. They don’t have any option now to know what their customers talking about their service / campaigns in the social media.  If they know their customers latest check- in locations, they shall offer a customized package based on location and get a better business.

If XYZ Corporation would like to solve these problems means – they needs to collect the customer information from the social media and connect this with their internal data. For this first they need to build a social master table which will consolidate user data from various social sources. Later data from the social master table should be matched against the internal records. This means their first challenge is getting all the user identity information form these social sites and building social master table.

Let’s look deeper into the relevant user identities from social media,

  1. User ID from the social media channel.
  2. Customer’s name, email id, phone number, address: This could be one real situation. We need to apply a sophisticated matching and cleansing algorithm on these records for consolidation. Because customer may not be providing the exact same identity in the social media. E.g. internal records may have the name as “Jane Duke”, but in the social media her name could be “Jane D”, she may be providing different email ID in these systems.

Following are some of the fields we can collect

Field

Challenge

Name

There are chances that customer may not provide their full name as they mentioned in the internal system.

Email Id

Customers will have more than one email id.  Email ID given in the internal system may not be same as the email id provided in the social media. Also some customers might have set the security restrictions on accessing email Id.

Phone number

Phone number also will have the same challenge. As getting phone number from social media channel will have more restriction.

Address

There are chances that customer may provide current living address which is different from the address provided in the internal system. Also there could be access restriction on this.

Work/ Education history

This could be helpful only in case of some specific business scenarios. We could extract from Facebook or LinkedIn

Locale/Time zone

This information is helpful to narrow down the match. But some cases this locale or time zone can be misleading if the user configures wrong information.

Gender

Sometimes user may not be updating this information.

The overall process can be divided into two areas

  1. Collecting data from Social media
  2. Building social master table and matching the internal data.

Now XYZ Corporation shall follow this implementation process to build the Social Intelligence system. We will discuss on each of these in detail

  1. Collecting data from Social media :

We will elaborate in more details on collecting user identity details from Facebook. Implementation for collecting identity information from other social media channel is almost similar.

Facebook provides options like Graph API and FQL (Facebook Query Language, But this will be retired by end of 2016) through which we can collect the information. But the challenge here is users don’t expose the complete profile details like birthday, email address etc…  to any public account, until there is a specific permission granted from the end user. So better strategy is using the Facebook Application.

Today most of the business has a social presence like Facebook page, Facebook application etc… The purpose of the application would be providing some specific promotions, survey, some reward points, promotion codes etc .. There could be many applications created for their business for various purposes. So there is a good chance of customers directly visiting these applications and leveraging the same. This improves the possibility getting more accurate profile information from social media. This helps to find the match between app user/ page visitors and the internal contacts.

If we consider the case of the Facebook application, getting the permissions from the end users can be easily achieved through the app permission. The app permission can be configured while publishing the application. If you notice while you accessing  any of the  new Facebook application (it is the same case for any other platform like Twitter , Google+ etc. .. ) the first time, it will prompt for the list of permissions ( like email id, birthday etc. ) , list of access permissions  needs be granted by the end user/ customer. Once the customer grand’s the permission to these applications, it can access the user details which are granted any time using the APIs provided by these social platform e.g. : Facebook Graph API.

Now ABC Corporation can start publishing their Facebook application and building a connector to harvest the customer details.

a. Facebook application: This application should be configured to ask for the user permissions. This can be configured in https://developers.facebook.com. As part of this they need to choose the relevant permission, so that this application can access the relevant user identity information. This application can be created as an application in Facebook (the content of the application would be hosted -here we link the canvas URL to the hosted site URL.). Or this could be mobile application or this could be used for managing the authentication to the company website.

b. Connector to harvest the user data on behalf of the application :

Here we need to connect to the respective social site API , in our discussion Facebook and collect the data. All of this workflow can be implemented by calling the REST APIs provided by the Facebook Graph API. Which means this can be achieved using any technology ( Java , Python , Ruby , HANA XSJS ) which you  are comfortable.

SAP HANA XSJS provided options to make outbound HTTP calls and get the details. You shall refer the SCN blog: Configuring outbound calls from HANA XSJS. For Facebook we need change the host name as “graph.facebook.com” and configure the trust store.

To access any of the Facebook graph API we need a valid access token. To get the access token we need “App ID” and “App Secret” (These details are available in the settings tab). If the connecter is implemented using technologies like Java, there are lots of libraries available as open source to generate access token from the Facebook (any other platform) using the OAuth. All you need to implement using one such libraries like “scribe – java” .If you don’t want to use the third party libraries or implement using SAP HANA XSJS, you shall refer the Facebook API documentation and follow the steps to generate the access tokens.

Once we have the valid access token, we need to use the Facebook graph API: http://graph.facebook.com/{user-id}  to get the user details. This will provide the complete user profile information in JSON format.

Now we have the next challenge: there is no direct API to get the list of application users. There could be different approaches we can try for this like

a) Caching the user ID in the application’s database. This could be very much applicable the application is for managing the authentication to the company website.

b) Whenever a user subscribes to the application, then user “likes” a   specific post in the companies Facebook page.

c) Try to get all the users detail those who interacted in the Facebook page. There is a good chance of getting all the app users through this.

Let us discuss the second option , whenever a user subscribes to the application we ask for the permission to post on use’s behalf – so that using the below graph API the application implicitly “like” the specific on users behalf.

HTTP POST method on “http://graph.facebook.com/{object-id}/likes” . Here the object- id the ID of the post specific post in the Facebook Page).

Whenever the user subscribes to the Facebook application, it automatically “likes” the predefined post in the background. Then later the connector can get the list of users by the HTTP GET method using  ” http://graph.facebook.com/{object-id}/likes” . ID’s collected from this API can be the input for the user details API. The user details API needs to be called for each user. If we need to harvest the user likes and their friends etc. . (Make sure you have valid permission and is approved by the Facebook), then you need to use appropriate API’s and get the additional details.

Along with the user details from the graph API, we collected below details also

1. Scraped user “about page” data ( http://www.facebook.com/{user-id} /about ) . For this we used HTMLUnit library (in JAVA) to simulate the workflow. But this may not be a right approach for the production as web scraping is not legal.

2. Each user likes & interests using the graph API  : http://graph.facebook.com/{user-id}/likes

If we choose the second option to get the list of users – ie users who posted, commented, shared, likes any post in the Facebook Page. There are Graph APIs available for the same and is straight forward. We could implement using FQL.

Finally the connector will insert all the collected user details into tables in the HANA system which will be used for the further processing. Let’s call these tables as “Social Staging Table “.

  2.  Building social master table and matching the internal data.

In this step we need to match the user details collected from the various social media channels to build the social master table and finally match against internal records.

Before we start the match process we need to standardize and cleanse the data we got from the social media. For this we used the SAP Data Quality (DQ) libraries which are part of HANA. Below link     provides details on various Data Quality offering  from SAP.

Note: make sure you have the following latest directories downloaded from SAP service market place ( Service market place -> Software Downloads -> Address directories and reference data ) and installed in your HANA system

    1. Global Address directories
    2. Country specific directory if needed
    3. Geo coder libraries ( for specific countries )

       Following steps were executed on each of the user records.

1.

Remove the duplicate user records with in a batch load from the social staging table. If we have more than one record for same entity, then give preference to the latest record.

      2.

Normally user locations may not be cleansed. To address this issue we used combination of Text Analysis Engine available in HANA  and the Global Address cleanse feature as part of DQ library in HANA. Text analysis is used to identify the location entities from the unstructured address.

e.g.: Assume user address from the social media is like “Bangalore, India” – from this we use TA engine to identify the location entities and TA result would be like

Bangalore – LOCALITY

India – COUNTRY

So now we have better input for the DQ Global address cleanse. Output from DQ Global Address will provide the state – which is additional information. Using HANA DQ we will get a cleaner and enriched address like

Bangalore – LOCALITY

KARNATAKA – REGION

INDIA – COUNTY

PIN CODE: 560066

       3. Geo coder libraries which is part of SAP HANA Data Quality library as AFL is used for identifying the longitude and latitude of the location. Output from the global address cleanse is given to the Geo coder for finding longitude and latitude. Note that for some of the user records we might get direct longitude & latitude information from Facebook Graph API itself which can be used directly.
       4. Standardize other identities like phone number, email id.
                       5. All the standardized users basic identity information like user ID, Name, Email , Phone ,Address , Birthday ,Longitude/Latitude are loaded into a social master table.
                6.

If we have user data collected from the various social sites we need to apply various matching rules and consolidate records in the social master table. After the consolidation the matched records will have same social master id. Detailed matching logic is discussed below.

      7.

The matching logic we shall use matching engine available in the SAP HANA Data Quality library. Following are the match rules applied. Each of the below rules are executed in sequence on the entire records. If one of the rule is success and they we consider as match passed.

1. Exact match on the user ID : Two records have same matching user ID for their social media. Like in the Facebook user linked twitter account .
2. Exact match based on email ID : if the email ID of the two  social record matches that considered and the match pass.
3. Name and phone number: For the name last name should be 100% match, First name is 40% match and phone number should be 100% match.
4. Name, Gender & Birthday match: The first name match should be 100 %, last name should be 40 % , Gender should be 100% match & birthday should be 100% match.
5. Name, Gender & address match: First name should be 100% match, last name should be 60% match, , Gender 100% Country 100% , locality & region should be 100%.
6. Name & address loose match : Here the first name is 100% , last name 60 % , country is 100% ,  locality and region matched based on the geo proximity match rule ( which is based on longitude & latitude ) available in SAP DQ match library.
7.

Name, Gender & time zone match: All these fields should be 100% match expect last name. Last name shall be 40% match

If the match is passed by any of the match rule, they will be considered in same match group. Same match group records will be assigned with a social master id.

       8.

Load the internal records into the Internal Master table.

      9.

Run the matching logic between the records from Social Master table and Internal Master table. The matching rule could be same as we discussed for consolidating the social master table discussed in step 7.

     10. If the match is passed by any of the match rule, the id from the social master table will be updated into the enterprise master table. So joining all these tables like enterprise master, social master table will give 360 degree view of the customer.


Below diagram summarizes the steps we discussed before


Social MDM.png


After realizing above discussed steps ABC Corporation can now get hold of their customer information in real time and along with their internal data. This can be used for building their campaigns and promotions.

Along with identity information harvester can collect additional information about the customer. Each social platform will give different data elements. Below table summarizes of them

Facebook

Google Plus

Twitter

LinkedIn

Friends

Friends & Circle

Followers

Educational background

Subscriptions

Subscriptions

Short description

Work History

Educational background

Educational background

Current location

Connections

Work History

Work History

Tweet locations

Groups

Photo & tags

Photos

People you follow

Contact info

Current location

Location

Location

Living address

Contact Information

Interests

Likes

Check-ins

Contact information

Liked Articles

About description

Interests

Public Posts

Interests

Music,Movies,Books

Check-in

Events

Pages you like

Birthday

Public post locations

Though XYZ Corporation is now able to collect the details from the social media, there could few of the bellow challenges

  1. It would be difficult get many users get subscribed  to the Facebook application
  2. There could be legal issue on persisting the user details.
  3. Many users may not be updating their up to date profile information in the social website.

Conclusion

As discussed in the challenges section, collecting user profile from social media and using it for the campaigns may have legal issues which need to be validated. Since we asked for user consent this may not be an issue. But if we keep this part aside, SAP HANA will provide all the power to realize the Social Intelligence with its real power and inbuilt capabilities.

Instead of using the direct Facebook/Twitter API we shall look also into the possibilities of getting the user profile information from the data aggregators like

If you are interested in knowing more on

                > Facebook Query Language: https://developers.facebook.com/docs/reference/fql/

                > Facebook Graph API : https://developers.facebook.com/docs/graph-api/quickstart/v2.1

                >Twitter API : https://dev.twitter.com/docs/api


To report this post you need to login first.

3 Comments

You must be Logged on to comment or reply to a post.

    1. Saritha Koroth

      Hi Jais,

      Can you please share your contact id as I have some queries on above scenario and want to get to discuss it offline or send me a direct message?

      Regards,

      Saritha K

      (0) 

Leave a Reply