Text Analysis of IPL Match using Twitter Data (Part 2)
Hi Friends,
I am back with continuation to my below blog,
In this part of document, we will be focusing on Custom Dictionaries.
Recap:
If you refer to below screen shot it indicates when SQL query is executed for
TA_TYPE = ‘PERSON’, Virat Kohli & Ashwin repeated few times in separate rows.
Why this is happening:
When comments are entered in Twitter by different Users, it depends on individuals
how data is entered.
Possibility of having Cricketer names entered in different ways is a common scenario.
To make it Standard and for easy analysis, we need to create custom dictionaries and let system return a uniform name when SQL is executed.
Now let’s see how this can be achieved:
Create custom HANA Text Analysis configuration file
In HANA studio create a workspace followed by creating and sharing a project.
Under this project create a new file with extension “hdbtextconfig”.
Copy all the contents of one of the predefined configurations delivered by SAP they are located in the HANA repository
package: “sap.hana.ta.config”.
For this exercise, let’s copy contents of the configuration file “EXTRACTION_CORE_VOICEOFCUSTOMER”.
Creating a Text Analysis Configuration: Section 10.1.3.2.1 of the
HANA developer guide SPS07: http://help.sap.com/hana/SAP_HANA_Developer_Guide_en.pdf
In next document I will highlight how to create Custom Dictionary and put in Custom Configuration that we created just now to achieve analysis on Twitter Data and avoide repeated names when running SQL to perform analysis.
Please let me know if any open questions and I will be happy to answer those.
Hi Rahul
Your post is really helpful. I'm a beginner at HANA i have a URL which returns a JSON containing all my LinkedIn Connections i want to add this data into my HANA tables. I noticed that you imported a excel but that is not possible for me.Can you suggest any steps ?