Skip to Content
Author's profile photo Former Member

SAP Sybase IQ UDA: Full Text Search by Examples

The Unstructured Data Analytics (UDA) option extent the capabilities of SAP Sybase IQ to perform Data Mining.  Here you will find examples of queries that are possible using the UDA option.  You will need a UDA license to run these queries.

 

For these examples I created a table as follow:

 

CREATE TABLE DBA.MyReport  (

          MyRptID       integer NOT NULL,

          MyRptText      CLOB NULL,

          PRIMARY KEY(MyRptID)

  );

   

The table was populated with 1.7 millions of records, the MyRptText column was populated with clinical laboratory results saved in text format.  The database as created with CASE RESPECT.

 

Two (2) indexes and one(1) text configuration object were created, a text index using the default_char configuration object, and a NGRAM text index using a new text configuration object.

    

Create text index RptTextIdx on DBA.MyReport  

(MyRptText) configuration default_char;

     

— Create a new configuration object

  

CREATE TEXT CONFIGURATION myFuzzyTextConfig FROM default_char;

— Change to NGRAM and set maxlength to 5

 

ALTER TEXT CONFIGURATION myFuzzyTextConfig

  TERM BREAKER NGRAM;

 

ALTER TEXT CONFIGURATION myFuzzyTextConfig

  MAXIMUM TERM LENGTH 5;

CREATE TEXT INDEX myFuzzyTextIdx

ON DBA.MyReport (MyRptText) CONFIGURATION myFuzzyTextConfig;

    

All the queries were successfully executed, the results are not included in this document.  No performance data is provided, but all of the queries executed in 1 minute or less.

    

Use Case 1: Find the 10 most relevant records relative to a term using the calculated score from the CONTAINS clause.

 

select MyRptID, ct.score, MyRptText

  FROM MyReport CONTAINS (MyReport.MyRptText, ‘Difficile | DIFFICILE’) as ct

  where ct.score > 10

  order by ct.score desc;

    

Use Case 2: Search records containing a phrase (can also be done with LIKE)

 

select count(*)

  FROM MyReport

        CONTAINS (MyReport.MyRptText, ‘Clostridium difficile toxin A/B DETECTED’ ) as ct;

  

select MyRptID, MyRptText

  FROM MyReport

        CONTAINS (MyReport.MyRptText, ‘Clostridium difficile toxin A/B DETECTED’ ) as ct;

Use Case 3: Search records using Boolean expression

  

select *

  FROM MyReport

        CONTAINS (MyReport.MyRptText, ‘”Clostridium difficile toxin A/B DETECTED” and “ER”‘ ) as ct;

select *  

  FROM MyReport CONTAINS (MyReport.MyRptText,‘(Clostridium | difficile | DIFFICILLE) AND DETECTED’) as ct;

  

select *  

  FROM MyReport CONTAINS (MyReport.MyRptText,‘((Clostridium | difficile | DIFFICILLE) AND DETECTED) AND NOT (” by PCR”)’) as ct;

  

select *  

  FROM MyReport CONTAINS (MyReport.MyRptText,‘((Clostridium | difficile | DIFFICILLE) AND DETECTED) AND (” by PCR”)’) as ct;

    

Use case 4: Find terms in proximity

 

select

  FROM MyReport CONTAINS (MyReport.MyRptText,‘difficile NEAR blood’) as ct;

select

  FROM MyReport CONTAINS (MyReport.MyRptText,‘difficile NEAR negative’) as ct;

select

  FROM MyReport CONTAINS (MyReport.MyRptText,‘cholesterol’) as ct

  order by ct.score desc;

select

  FROM MyReport CONTAINS (MyReport.MyRptText,‘gallbladder BEFORE cholesterol’) as ct

  order by ct.score desc;

  

select MyRptID, ct.score, MyRptText 

  FROM MyReport CONTAINS (MyReport.MyRptText,‘cholesterol NEAR ultrasound’) as ct

order by ct.score desc;

  

select MyRptID, ct.score, MyRptText 

  FROM MyReport CONTAINS (MyReport.MyRptText,‘cholesterol NEAR[2,5] ultrasound’) as ct

  order by ct.score desc;

   

Use case 5: Find terms using prefixes

 

select count(*) 

  FROM MyReport CONTAINS (MyReport.MyRptText,‘Urin*’) as ct;

select count(*) 

  FROM MyReport CONTAINS (MyReport.MyRptText,‘Urin* AND NOT Nega*’) as ct;

  

select count(*) 

  FROM MyReport CONTAINS (MyReport.MyRptText,‘Urin* AND Aty* AND NOT Nega* ‘) as ct;

select count(*) 

  FROM MyReport CONTAINS (MyReport.MyRptText,‘Urin* AND Aty* AND NOT Nega* AND blood ‘) as ct;

select MyRptID, ct.score, MyRptText 

  FROM MyReport CONTAINS (MyReport.MyRptText,‘Urin* AND Aty* AND NOT Nega* AND blood ‘) as ct

  order by ct.score desc;

Use case 6: Search for term similar to a given term (FUZZY SEARCH)

select count(*)   FROM MyReport  

        CONTAINS (MyReport.MyRptText, ‘FUZZY “difficile”‘ ) as ct;

select count(*)   FROM MyReport  

        CONTAINS (MyReport.MyRptText, ‘difficile AND FUZZY “DETECTED”‘ ) as ct;

  

select count(*)   FROM MyReport  

        CONTAINS (MyReport.MyRptText, ‘difficile AND FUZZY “DETECTED” AND NOT DETECTED’ ) as ct;

select top 100 FROM MyReport  

        CONTAINS (MyReport.MyRptText, ‘difficile AND FUZZY “DETECTED” AND NOT DETECTED’ ) as ct

  order by ct.score desc;

    

Use case 7:  Find the number of indexed values in which a term appears

 

select * from sa_text_index_vocab(‘RptTextIdx’,‘MyReport’,‘DBA’)

   where term like ‘%[Cc]holesterol%’;

  

select * from sa_text_index_vocab(‘RptTextIdx’,‘MyReport’,‘DBA’)

   where term like ‘%[Cc][Hh][Oo][Ll][Ee][Ss][Tt][Ee][Rr][Oo][Ll]%’;

select * from sa_text_index_vocab(‘RptTextIdx’,‘MyReport’,‘DBA’)

   where term like ‘%[dD][Ii][Ff][Ff][Ii][Cc][Ii][Ll][Ee]%’;

   

These are only a few examples of the type of queries that can be used on a CLOB column using the UDA option; This is not a complete list of possibilities.

Assigned Tags

      3 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Mark Mumy
      Mark Mumy

      Great write-up on this topic!

      Author's profile photo Former Member
      Former Member
      Blog Post Author

      Thanks Mark.

      Author's profile photo Former Member
      Former Member

      Hi Guys!

      I've been having problems with the fuzzy search, or any CONTAINS use.

      I have a table with a VARCHAR(256) column, every CONTAINS search return "CONTAINS has NULL search term.". I tried the code written here, it worked with most CONTAINS examples used, but not the fuzzy. Looks like the NGRAM breaker is the problem.

      Do you know why is it?