# Predicting My Next Twitter Follower with SAP HANA PAL

I am very lazy when it comes to social networks, I would love to have thousands of followers in Twitter, but I don’t have the will to tweet frequently enough to grow my number of followers. Regardless of that, I regularly check my twitter account expecting that magically some new follower comes my way, and when it does, I feel like I accomplished something. I know its silly, but I can’t help it. Anyway, I wonder if I could use the SAP HANA Predictive Analytics Library (PAL) to see who my next follower will be. SAP introduced many new features with the release of SPS06, and one of those is the Link Prediction Algorithm in PAL. Predicting links in social networks is not something new, it has been around for many years. This algorithm tries to answer the following question: Given a snapshot of a social network, can we predict which new interactions among its members are likely to occur in the near future? This is commonly known as the link prediction problem and there are multiple approaches based on measures for analyzing the “proximity” of the different nodes in a network. When we say social networks, we not only mean Twitter or Facebook, but it can also apply to, for example, employees in a company. This algorithm is also oftenly used in Fraud Prevention to detect missing nodes (fraudsters) in criminal networks.

Like I already said, there are multiple ways in which we can approach the link prediction problem, and specifically in PAL, there are 4 different methods implemented to compute the distance of any two existing nodes using existing links in a network:

• Common Neighbours
• Jaccard’s Coefficient
• Katz

I’m not going to get into the details of how the different methods work, for that you can take a look at the PAL User Guide. Instead I’m going to get my hands on it ;).

I want to predict my next twitter follower, so the first thing I need to do is download data from twitter that I can use to train the algorithm. For that I’m going to use Python, more specifically, a Python library called Tweepy which is basically a wrapper around the Twitter API.

First we need to setup Python to be able to connect to HANA. If you don’t know how to do this, you can take a look at this wonderful post by Blag that shows how to do it.

Now that we are all set, we can start downloading data from Twitter. I’m going to create a Column Table in HANA to store the data.

``CREATE COLUMN TABLE LINK_PREDICT( FOLLOWER INTEGER, FOLLOWING INTEGER );``

First I’m going to download my Followers List by running the following Python Script. I don’t have a lot of followers so this will only take a couple of seconds.

``````import tweepy
import dbapi
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
con = dbapi.connect('hana_host', 30015, 'SYSTEM', 'password')
cur = con.cursor()
for user in tweepy.Cursor(api.followers_ids, screen_name="LukiSpa").items():
cur.execute("INSERT INTO LINK_PREDICT VALUES(?,?)", (user, 'userid')) #Save the content to the table. Replace userid with your Twitter User ID  ``````

Now, I would like to get the Followers of my Followers, for that I’m going to run the Python script below. Beware that Twitter limits the number of request you can make to the API, so to avoid exceeding that limit and getting an error message I’m waiting 60 seconds before making a new call to the API, that means that this code can run for quite a long time, so I would suggest running it over night.

``````import tweepy
import dbapi
import time
consumer_key="..."
consumer_secret="..."
access_token="..."
access_token_secret="...
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
con = dbapi.connect('hana_host', 30015, 'SYSTEM', 'password')
cur = con.cursor()
query = "SELECT FOLLOWER FROM LINK_PREDICT"
ret = cur.execute(query)
ret = cur.fetchall()
for row in ret:
ids = []
for page in tweepy.Cursor(api.followers_ids, id=row[0]).pages():
ids.extend(page)
time.sleep(60)
for user in ids:
cur.execute("INSERT INTO LINK_PREDICT VALUES(?,?)", (user, row[0]))``````

And finally, I want to download my Followings plus the Followings of my Followers (besides me)

``````import tweepy
import dbapi
import time
consumer_key="..."
consumer_secret="...
access_token="..."
access_token_secret="..."
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
con = dbapi.connect('hana_host', 30015, 'SYSTEM', 'password')
cur = con.cursor()
query = "SELECT DISTINCT FOLLOWING FROM LINK_PREDICT"
ret = cur.execute(query)
ret = cur.fetchall()
for row in ret:
ids = []
for page in tweepy.Cursor(api.friends_ids, id=row[0]).pages():
ids.extend(page)
time.sleep(60)
for user in ids:
cur.execute("INSERT INTO LINK_PREDICT VALUES(?,?)", (row[0], user))``````

Now I’m ready to run the Link Prediction Algorithm. I wanted to run it using the AFM (Application Function Modeler), but for some reason this algorithm is not available in the tools palette, not sure if this is a bug or something wrong with my PAL implementation (any comments here will be much appreciated), so I will need to do it the old way.

First I create the procedure by calling AFL Wrapper Generator

``````SET SCHEMA MYSCHEMA;
DROP TYPE PAL_LP_DATA_T;
CREATE TYPE PAL_LP_DATA_T AS TABLE("FOLLOWER" INTEGER, "FOLLOWING" INTEGER);
DROP TYPE PAL_LP_RESULT_T;
CREATE TYPE PAL_LP_RESULT_T AS TABLE("FOLLOWER" INTEGER, "FOLLOWING" INTEGER, "SCORE" DOUBLE);
DROP TYPE PAL_CONTROL_T;
CREATE TYPE PAL_CONTROL_T AS TABLE( "NAME" VARCHAR(100), "INT_ARGS" INTEGER, "DOUBLE_ARGS" DOUBLE, "STRING_ARGS" VARCHAR(100));
DROP TABLE PAL_LP_PDATA_TBL;
CREATE COLUMN TABLE PAL_LP_PDATA_TBL( "ID" INTEGER, "TYPENAME" VARCHAR(100), "DIRECTION" VARCHAR(100));
INSERT INTO PAL_LP_PDATA_TBL VALUES (1,'MYSCHEMA.PAL_LP_DATA_T','in');
INSERT INTO PAL_LP_PDATA_TBL VALUES (2,'MYSCHEMA.PAL_CONTROL_T','in');
INSERT INTO PAL_LP_PDATA_TBL VALUES (3,'MYSCHEMA.PAL_LP_RESULT_T','out');

``````SET SCHEMA MYSCHEMA;
DROP TABLE #PAL_CONTROL_TBL;
CREATE LOCAL TEMPORARY COLUMN TABLE #PAL_CONTROL_TBL LIKE PAL_CONTROL_T;
INSERT INTO #PAL_CONTROL_TBL VALUES ('THREAD_NUMBER', 2, null, null);
INSERT INTO #PAL_CONTROL_TBL VALUES ('METHOD', 1, null, null);
INSERT INTO #PAL_CONTROL_TBL VALUES ('BETA', null, 0.005, null);
DROP TABLE LP_RESULT;
CREATE COLUMN TABLE LP_RESULT LIKE PAL_LP_RESULT_T;
CALL _SYS_AFL.PREDICT_FOLLOWER(LINK_PREDICT, #PAL_CONTROL_TBL, LP_RESULT) with overview;``````

Let’s take a look at the results

Hmmm, seems like I will have one new follower, let’s see on tweeterid.com who he/she is

@atul_vaikul, I have no idea who you are but I’m here waiting mate! 🙂

We went thru all this trouble to find my next follower, but that’s not all, I can also find out in the results who should I be following

@SAPCommNet is the twitter account of SCN – I was surprised that I didn’t already follow it. Same with @SAPinMemory, almost a no-brainer to follow and @JohannesSchnatz is in fact blogging a lot about SAP and SAP HANA. I don’t really share his interest for SAP HCM (and fishing), but we are both guitar players, as it seems!

Hope you liked it!

Info en Español sobre SAP HANA™:

www.HablemosHANA.com

### Assigned Tags

You must be Logged on to comment or reply to a post.

This is great! 😉

Former Member
Blog Post Author

Thanks! 🙂

You got that one off. I was your next follower. 😛

Former Member
Blog Post Author

hahaha...I will need to improve the accuracy of the algorithm...

Thanks, very nice.

I'll definitely be trying your example out.  🙂

Hahaha!! You made my day with this blog! This is fantastic combo of science and good humor 😀

Former Member
Blog Post Author

Thanks Vitaliy! 🙂

I think this is a much more better approach to get new followers, than the algorithm itself.

Former Member
Blog Post Author

That is so true! I already have 4 new followers but I didn't hear from @atul_vaikul yet, but I'm not losing hope

And... have you seen what happens when you get new followers? I mean, with this publication, you have influenced the "prediction", has this prediction changed?

Former Member
Blog Post Author

Didn't try it yet, but that should be a nice test...

lol

Came here after reading Jose's blog on SP. This makes it even more interesting. Thanks.

"Tweepy" 🙁

Hi All, It is indeed a nice article. I have used  twitter4j and java to get the tweets in HANA in one of my POC.

However this time trying using python tweeter API.

However when finally running my python script end up with

"ImportError: No module named tweepy"  error.

I already searched in forums but ended up with  no result.

Any help will be really appreciated..

Thanks...

Hi that's a pretty common Python error. It's doesn't know the path where you have downloaded tweepy.   You may be able to move tweepy to a sub directory of your main python script file, or add tweepy to your python sys paths.

http://stackoverflow.com/questions/7587457/importerror-no-module-named-python

awesome, lo tengo que probar!

This is one of the blog that inspires you to look into technology as it presents an end to end use-case succinctly. Thank you again Lucas (@LukiSpa) !

Former Member
Blog Post Author

Thank you very much! I really appreciate it 🙂

Genius!!!

The potential of such logic is imperative to an amazing (and mind-boggling!) future.

Reading your post inspired me to get into deep thinking about random scenarios and all the factors affecting each; and about how SAP HANA could be used to make related predictive analyses.

I apologize if my comment appears to be unstructured, but... the power of HANA is simply overwhelming, and you did a great job illustrating a part of its real-world uses!

Tharindu Fernando

Former Member
Blog Post Author

Thank you very much for your comment! I'm really glad this post is inspiring possible new use cases for SAP HANA.... 🙂

Muy interesantes, gracias por la información! 🙂