HXE and Fuzzy Search
Three blogs so far, the first in the series was about setting up my Intel Nuc and downloading my SAP HANA, express edition of choice. The second was about installing SAP HANA, express edition (HXE). Now in the 3rd post we connected SAP HANA Studio and created a project that we shared with the server.
So far so good, and we’ve done our development. However, what if I don’t want to create a project and all of that but what if I just want to put some data in and actually run some SQL on it? Well this post is all about that.
I decided to hunt the net for some movie review data, something like in Amazon or IMDB. I found some but then with some work I managed to piece together a CSV file with a decent amount of records.
I then opened up my SAP HANA Studio and made sure my HXE system was running. I then opened an SQL console on my catalog repository.
There I entered my first SQL command and that was to generate a new SCHEMA, you can of course use whichever existing schema you like or even the one assigned to your user. I prefer to separate.
create schema "SDFM";
Then I needed to create a table.
CREATE COLUMN TABLE "SDFM"."sap.devs.demo.fuzzymovie.data::movies.Review"
("ID" INTEGER CS_INT NOT NULL , "RTEXT" NCLOB MEMORY THRESHOLD 1000, PRIMARY KEY ("ID"))
UNLOAD PRIORITY 5 AUTO MERGE ;
Once I had my table, knowing I want to do Fuzzy search I needed to set a full text index.
CREATE FULLTEXT INDEX "_ESH_FULLTEXT_733192#733194"
ON "SDFM"."sap.devs.demo.fuzzymovie.data::movies.Review" ("RTEXT") ASYNC PHRASE
INDEX RATIO 0.000000 FUZZY SEARCH INDEX ON SEARCH ONLY OFF FAST PREPROCESS
ON TEXT MINING OFF TEXT ANALYSIS OFF TOKEN SEPARATORS '\/;,.:-_()[]<>!?*@+{}="&#$~|'
Now at this point I had my schema and table and I was ready to do some data inserts.
To insert data you can run a SQL command, or as in the previous post you can do a CSV file import.
INSERT INTO "SDFM"."sap.devs.demo.fuzzymovie.data::movies.Review" VALUES(
/*ID <INTEGER>*/,
''/*RTEXT <NCLOB>*/
);
I cheated and did a CSV file import simply because of the size of the data that I wanted in the system, but if I was to build the data slowly over time I could see using the individual SQL statement or even going back to the project route and designing an input mask via HTML for it.
The final step to see if it works was to run a SQL statement using the Fuzzy Search functions.
SELECT SCORE() AS score, RTEXT
FROM "SDFM"."sap.devs.demo.fuzzymovie.data::movies.Review"
WHERE CONTAINS(RTEXT,'blody', FUZZY(0.7) )
ORDER BY score DESC
As it was movie reviews and since it’s close to Halloween I decided to go with horror movies and therefore searching for the word “bloody” but I am a horrible speller so I mispelled it as “blody” but with the Fuzzy search it should still find relevant results for me.
As you can see relevant results with the words “bloody” and “blood” did show up!
Hello Craig,
Thanks for your blog. Do you know if Data Profiling and other functionalities from Smart Data Quality are available for Express Edition?
Thanks.
Hi Ivan
We did not include SDQ in the Express Edition Package for a variety of reasons - some of them relates to the out of the box footprint requirements, which we are trying to keep to a minimum.
I would encourage you to look at this blog from Frank Schuler, though, which provides a way to get SDI configured - a similar method would probably work! It does require you to have a separate license for SDQ.
https://blogs.sap.com/2016/10/22/enable-smart-data-integration-hana-express-edition/