Disclaimer
This tutorial is intended as a guide for the creation of demo/test data only. The sample script provided is not intended for use in a productive system.
Purpose
The following tutorial explains a way of harvesting twitter data through GNIP. The pre-installed Python Interpreter from the SAP HANA client is used to execute a Python script from SAP HANA Studio. The script harvests the data from GNIP and extracts the useful data out of it and stores these details into Business Suite Foundation database tables SOCIAL DATA and SOCIALUSERINFO. Currently the script runs infinitely. If you want to stop harvesting the data, you can manually do it by stopping the execution of this script in the SAP HANA Studio. You can however modify the script to run for a specific period of time. To run the script, you will also need to make a few customizing and configuration settings in order to use the Pydev Plugin in SAP HANA Studio.
Prerequisites
Make sure that the following prerequisites are met before you start out :
• Installation of SAP HANA Studio and SAP HANA Client
Install SAP HANA Studio and SAP HANA Client and apply for a HANA user with Read, Write and Update authorization for foundation database tables   SOCIALDATA and SOCIALUSERINFO
Create a GNIP account
Data Stream configuration in your GNIP account
Create a data stream for a source (like Twitter, Facebook, etc…) in your GNIP account. Remember, using a data stream you can harvest data from only a single source. So you should have different data streams for different data sources. After creating a data stream, define the rules in the ‘Rules’ tab to filter the data that you are getting from GNIP. For writing the rules refer the link : http://support.gnip.com/apis/powertrack/rules.html

Setup
1. Configuring Python in SAP HANA Studio Client
   
Python version 2.6 is already embedded in SAP HANA client, so you do not need to install Python from scratch. To configure Python API to connect to SAP HANA, proceed as follows.
        
1. Copy and paste the following files from C:\Program Files\SAP\hdbclient\hdbcli to C:\Program Files\SAP\hdbclient\Python\Lib
                a. _init_.py
                b. dbapi.py
                c. resultrow.py

2. Copy and paste the following files from C:\Program Files\SAP\hdbclient to C:\Program\Files\SAP\hdbclient\Python\Lib
                a. pyhdbcli.pdb
                b. pyhdbcli.pyd
          
Note:
       
In Windows OS, by default the installation path is C:\Program Files\SAP\.. for a 64 bit installation SAP HANA Studio and SAP HANA Database client

If you opted for a 32 bit Installation, the default path is C:\Program Files(x86)\sap\..


2. Setting up the Editor to run the file
2.1. Install Pydev plugin to use Python IDE for Eclipse
             
The preferred method is to use the Eclipse IDE from SAP HANA Studio. To be able to run the python script, you first need to install the Pydev plugin in SAP HANA Studio.
                   
                    a. Open SAP HANA Studio. Click HELP on menu tab and select Install New Software
                    b. Click the button Add and enter the following information
               /wp-content/uploads/2014/09/gnip1_549429.jpg
                       Name : pydev
                       Location : http://pydev.org/updates

                   c. Select the settings as shown in this screenshot.
                   /wp-content/uploads/2014/09/gnip2_549477.jpg
                       d. Press Next twice
                         e. Accept the license agreements, then press Finish.
                         f. Restart SAP HANA studio.

2.2. Configure the Python Interpreter

In SAP HANA studio, carry out the following steps:
     a. Select the menu entries Window -> Preferences
     b. Select PyDev -> Interpreters -> Python Interpreter
     c. Click New button, type in an Interpreter name. Enter in filed Interpreter Executable the following executable file C:\Program Files\hdbclient\Python\Python.exe. Press OK twice.

2.3. Create a Python project

In SAP HANA Studio, carryout the following steps:
     a. Click File -> New -> Project, then select Pydev project
     b. Type in a project name, then press Finish
     c. Right-click on your project. Click New -> File, then type your file name, press Finish.

Customizing and Running the Script

1. Customizing the python script

Copy and paste the below provided code into the newly created python file. Enter the values for the below parameters in the file.
     a. URL – unique url for the datastream you have created in your GNIP account
          (For ex : ‘https://stream.gnip.com/accounts/<GNIP_USERNAME>/publishers/<STREAM>/streams/track/dev.json’)
     b. username_gnip – your GNIP account username
     c. password_gnip – your GNIP account password
     d. server – HANA server name (Ex : lddbq7d.wdf.sap.corp)
     e. port – HANA server port
     f. username_hana – HANA server username
     g. password_hana – HANA server password
     h. schema – schema name
     i. client – client number


import urllib2
import base64
import zlib
import threading
from threading import Lock
import sys
import ssl
import json
from datetime import datetime
import calendar
import dbapi
from wsgiref.handlers import format_date_time
from time import mktime
CHUNKSIZE = 4*1024
GNIPKEEPALIVE = 30
NEWLINE = '\r\n'
URL = ''
username_gnip = ''
password_gnip = ''
HEADERS = { 'Accept': 'application/json',
            'Connection': 'Keep-Alive',
            'Accept-Encoding' : 'gzip',
            'Authorization' : 'Basic %s' % base64.encodestring('%s:%s' % (username_gnip, password_gnip))  }
server = ''
port =
username_hana = ''
password_hana = ''
schema = ''
client = ''
socialmediachannel = ''
print_lock = Lock()
err_lock = Lock()
class procEntry(threading.Thread):
    def __init__(self, buf):
        self.buf = buf
        threading.Thread.__init__(self)
    def unicodeToAscii(self, word):
        return word.encode('ascii', 'ignore')
    def run(self):
        for rec in [x.strip() for x in self.buf.split(NEWLINE) if x.strip() <> '']:
            try:
                jrec = json.loads(rec.strip())
                with print_lock:
                    verb = jrec['verb']
                    verb = self.unicodeToAscii(verb)
               
                    # SOCIALUSERINFO DETAILS
                    socialUser = jrec['actor']['id'].split(':')[2]
                    socialUser = self.unicodeToAscii(socialUser)
                    socialUserProfileLink = jrec['actor']['link']
                    socialUserProfileLink = self.unicodeToAscii(socialUserProfileLink)
                    socialUserAccount = jrec['actor']['preferredUsername']
                    socialUserAccount = self.unicodeToAscii(socialUserAccount)
                    friendsCount = jrec['actor']['friendsCount']
                    followersCount = jrec['actor']['followersCount']
                    postedTime = jrec['postedTime']
                    postedTime = self.unicodeToAscii(postedTime)
                    displayName = jrec['actor']['displayName']
                    displayName = self.unicodeToAscii(displayName)
                    image = jrec['actor']['image']
                    image = self.unicodeToAscii(image)
               
                    # SOCIALDATA DETAILS
                    socialpost = jrec['id'].split(':')[2]
                    socialpost = self.unicodeToAscii(socialpost)
                    createdbyuser = socialUser
                    creationdatetime = postedTime
                    socialpostlink = jrec['link']
                    creationusername = displayName
                    socialpostsearchtermtext = jrec['gnip']['matching_rules'][0]['value']
                    socialpostsearchtermtext = self.unicodeToAscii(socialpostsearchtermtext)
               
                    d = datetime.utcnow()
                    time = d.strftime("%Y%m%d%H%M%S")
               
                    creationdatetime_utc = datetime.strptime(postedTime[:-5], "%Y-%m-%dT%H:%M:%S")
                    creationdatetime_utc = creationdatetime_utc.strftime(("%Y%m%d%H%M%S"))
               
                    stamp = calendar.timegm(datetime.strptime(creationdatetime[:-5], "%Y-%m-%dT%H:%M:%S").timetuple())
                    creationdatetime = format_date_time(stamp)
                    creationdatetime = creationdatetime[:-4] + ' +0000'
               
                    if verb == 'post':
                        socialdatauuid = jrec['object']['id'].split(':')[2]
                        socialdatauuid = self.unicodeToAscii(socialdatauuid)
                   
                   
                        socialposttext = jrec['object']['summary']
                        socialposttext = self.unicodeToAscii(socialposttext)
                   
                        res = client + '\t' + socialmediachannel + '\t' + socialUser + '\t'  + socialUserAccount + '\t' + str(friendsCount) + '\t' + str
(followersCount) + '\t' + postedTime + '\t' + displayName + '\t' + displayName.upper() + '\t' + socialUserProfileLink + '\t' +image
                   
                    elif verb == 'share':
                        socialdatauuid = jrec['object']['object']['id'].split(':')[2]
                        socialdatauuid = self.unicodeToAscii(socialdatauuid)
                   
                        socialposttext = jrec['object']['object']['summary']
                        socialposttext = self.unicodeToAscii(socialposttext)
                   
                        res = client + '\t' + socialmediachannel + '\t' + socialUser + '\t'  + socialUserAccount + '\t' + str(friendsCount) + '\t' + str
(followersCount) + '\t' + postedTime + '\t' + displayName + '\t' + displayName.upper() + '\t' + socialUserProfileLink + '\t' +image
                   
                    print(res)
                    hdb_target = dbapi.connect(server, port, username_hana, password_hana)
                    cursor_target = hdb_target.cursor()
                   
                    sql = 'upsert ' + schema + '.SOCIALUSERINFO(CLIENT, SOCIALMEDIACHANNEL, SOCIALUSER, SOCIALUSERPROFILELINK, SOCIALUSERACCOUNT,
NUMBEROFSOCIALUSERCONTACTS, SOCIALUSERINFLUENCESCOREVALUE, CREATIONDATETIME, SOCIALUSERNAME, SOCIALUSERNAME_UC, SOCIALUSERIMAGELINK, CREATEDAT) values
(?,?,?,?,?,?,?,?,?,?,?,?) with primary key'
                    cursor_target.execute(sql, (client, socialmediachannel, socialUser, socialUserProfileLink, socialUserAccount, friendsCount,
followersCount, creationdatetime, displayName, displayName.upper(), image, time))
                    hdb_target.commit()
                   
                    sql = 'upsert ' + schema + '.SOCIALDATA(CLIENT, SOCIALDATAUUID, SOCIALPOST, SOCIALMEDIACHANNEL, CREATEDBYUSER, CREATIONDATETIME,
SOCIALPOSTLINK, CREATIONUSERNAME, SOCIALPOSTSEARCHTERMTEXT, SOCIALPOSTTEXT, CREATEDAT, CREATIONDATETIME_UTC) VALUES(?,?,?,?,?,?,?,?,?,?,?,?) WITH PRIMARY
KEY'               
                    cursor_target.execute(sql, (client, socialdatauuid, socialpost, socialmediachannel, createdbyuser, creationdatetime, socialpostlink,
creationusername, socialpostsearchtermtext, socialposttext, time, creationdatetime_utc))
                    hdb_target.commit()
            except ValueError, e:
                with err_lock:
                    sys.stderr.write("Error processing JSON: %s (%s)\n"%(str(e), rec))
def getStream():
    proxy = urllib2.ProxyHandler({'http': 'http://proxy:8080', 'https': 'https://proxy:8080'})
    opener = urllib2.build_opener(proxy)
    urllib2.install_opener(opener)
    req = urllib2.Request(URL, headers=HEADERS)
    response = urllib2.urlopen(req, timeout=(1+GNIPKEEPALIVE))
    decompressor = zlib.decompressobj(16+zlib.MAX_WBITS)
    remainder = ''
    while True:
        tmp = decompressor.decompress(response.read(CHUNKSIZE))
        if tmp == '':
            return
        [records, remainder] = ''.join([remainder, tmp]).rsplit(NEWLINE,1)
        procEntry(records).start()
if __name__ == "__main__":
    print('Started...')
    while True:
        try:
            getStream()
        except ssl.SSLError, e:
            with err_lock:
                sys.stderr.write("Connection failed: %s\n"%(str(e)))



2. Run the script from your editor


3. Checking the Results in the database tables SOCIALDATA and SOCIALUSERINFO.


Other blog posts on connecting Social Channels: 

Twitter connector to harvest tweets into Social Intelligence tables using Python script

http://scn.sap.com/docs/DOC-53824

Historical data harvesting from GNIP using Python scripts

http://scn.sap.com/community/crm/marketing/blog/2014/10/16/historical-data-harvesting-from-gnip-using-python-scripts

Demo Social and Sentiment data generation using Python script

http://scn.sap.com/community/crm/marketing/blog/2015/01/12/demo-social-and-sentiment-data-generation-using-python-script

(If you find any mistakes or if you have any doubts in this blog please leave a comment)

To report this post you need to login first.

Be the first to leave a comment

You must be Logged on to comment or reply to a post.

Leave a Reply