Text Analysis of IPL Match using Twitter Data (Part 3)
Let’s get into last part of series for Twitter Analysis and so far what we covered are
- HANA studio overview
- Basic SQL
- How tables are created
- Custom Configuration
Previous blogs link on same topic:
Let’s see how to create Custom Dictionary:
Follow the same path as we mentioned in previous blog for creating Custom Configuration to create Custom Dictionary.
–How to create Dictionary Name–
(XYZ).hdbtextdict “You can give name as per naming convention in place of XYZ”
Copy below code in Custom Dictionary that we created just now:
< dictionary xmlns=”http://www.sap.com/ta/4.0“>
< entity_category name=”PERSON”>
< entity_name standard_form=”Virat Kohli”>
< variant name=”virat” />
< variant name=”Virat” />
< variant name=”virat kohli” />
< variant name=”Virat Kohli” />
<variant_generation type=”standard” language=”english” />
What we achieved with above code? We have given few combinations how users can enter Virat Kohli name, but we have maintained the uniformity by giving< entity_name standard_form=”Virat Kohli”>.
So once we run the SQL on TA_TYPE = ‘PERSON’, we will not get different combinations for Virat Kohli rather it will be one row only for this
(Refer to blog 1 one for this).
We can keep on modifying the Custom Dictionary when & wherever needed.
Let’s include this in Custom Configuration now
1. Open the Custom Configuration we created in Blog 2.
2. Search for <!– List of Text Analysis extraction dictionaries for
Come down , before </property> </configuration> add below code
Let’s create Index now
CREATE FULL TEXT INDEX ipl ON “IPL”.”IPL Match_Twitter Data” (“tweetContent”)
TEXT ANALYSIS ON
Note – In CONFIGURATION_NAME , pass Configuration name which we created in Blog 2.
Run the SQL now and for sure what we received in below screen shot in Blog 1 will change.
Try and See the difference
Bye for now, will see you with something new.