Opinion mining (sometimes known as sentiment analysis or emotion AI) refers to the use of natural language processing, text analysis, computational linguistics, and bio-metrics to systematically identify, extract, quantify, and study effective states and subjective information. Sentiment analysis is widely applied to voice of the customer materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from marketing to customer service to clinical medicine.
In general, sentiment analysis aims to determine the opinion of a speaker, writer, or other subject with respect to some topic or the overall contextual polarity or emotional reaction to a document, interaction, or event.
We have witnesses few traditional methods to perform the text analysis. Here in this article, I would like to bring SAP Data Hub come alive with a simple use case on Sentiment/Text analysis on live feeds from twitter with the flexibility to use your own python libraries as well as custom sentiment categorization with java scripting.
Modern Landscapes challenges.
- Lack of enterprise readiness
- Challenges for traditional architecture due to multi-structures, large data volumes landscapes
- Improve the value of data instantly for analytics or operational programs
- Drive real business impact across the enterprise with trusted information
Difficulty of operationalizing data science.
- Company’s understanding of their customer and products has been declined because of data inaccessible
- Accelerate business efficiency with trusted data.
There is a MISSING LINK between Big Data and Enterprise Data that hasn’t been easily solved yet.
- Data is kept in silos (files, Hadoop, data warehouses) across the enterprise.
- User groups can’t access and work with data according their needs.
- Organizational boundaries between IT (big data vs. enterprise data), as well as LoB.
SAP Data Hub one modelling experience for data pipelines, workflows and data transformations. SAP Data Hub provides you with a flow-based programming model called Pipeline Engine, which allows you to develop and execute your script in a containerized environment that is managed by Kubernetes. This means you can use your language of choice such as R or Python to use a predefined operator or to develop your custom one. You can also install the required libraries as a docker file and tag your operator for the execution and more.
In short, Modeler graphical editor to build data transformation processes
- Extensible, Standard connectors
- Wrap custom code
- Dockerized operators
- Production-ready with Manage, Schedule and Observe
Here is the scenario which we have built:
1) Text Data – Big data using twitter API
- Extract live twitter feeds from Twitter using API’s from developer account.
2) Sentiment Extraction
- Apply Tweepy & Textblob python libararies to capture the sentiment score.
- Tokenize the tweets.
- Apply Sentiment Classifier.
- Ingest the sentiments into SAP HANA for analytics.
- Downstream into a Histogram to represent the distribution of polarity to estimate the probability distribution of sentiments.
Extract live twitter feeds and perform sentiment analysis:
In this article you will learn to create and execute a pipeline that extracts live twitter feeds from twitter application and performs sentiment analysis on the tweets using Python and send the results to SAP HANA for further analysis. Also we will downstream the same into a histogram to plot the estimate of the distribution of the polarity., either sentiment score value.
Create a Pipeline similar to the below.
Use the “Tweet Stream” Operator from the list of operators, this operators allows you to connect to twitter developer account using API’s. The operator has inbuilt python scripting with all the required libraries.
Tweepy is one of the python librararies which we will be using to perform the sentiment analysis and then the api part is to connect to the twitter developer account via twitter api hub. Rest of the python script is to fetch the twitter feeds as stream of messages and to handle all the exceptions. Part of this script also differentes the words from the text to categorize the sentiments.
The below blog helps in download and setup of docker, SAP Data Hub 2.3 developer edition on your windows or linux notebook/desktop.
Hope the above information helped to scale text analysis use case on unstructured data. SAP has added more integration services to its data hub solution in the latest editions 2.4 and 2.5.
Will summarize about SAP Business Warehouse integration with raw/text data in my next blog.