ta
, which acts as the root package of the demo project, with 3 subpackages:db
for anything related to the physical database structureservice
for the OData service, that exposes the database to our web-applicationweb
for the HTML5/UI5 application - i.e. the stuff that is served from HANA to run inside the client's web-browserta
package also contains these 2 files, which are necessary to expose it as a web application:.xsapp
- an empty file to make the contents of the package available via the XS webserver..xsaccess
- A configuration file for managing access and authorizations for the web application.CT_FILE.hdbdd
file in the db
package. This allows you to maintain the table definition as a repository object, which makes it transportable. CT_FILE.hdbdd
file has the following contents:namespace "system-local"."public"."rbouman"."ta"."db";
@Schema: 'RBOUMAN'
@Catalog.tableType: #COLUMN
Entity CT_FILE {
Key "FILE_NAME" : String(256) not null;
"FILE_TYPE" : String(256) not null;
"FILE_LAST_MODIFIED" : UTCTimestamp not null;
"FILE_SIZE" : Integer not null;
"FILE_CONTENT" : LargeBinary not null;
"FILE_LAST_UPLOADED" : UTCTimestamp not null ;
}
technical configuration {
FULLTEXT INDEX "FT_IDX_CT_FILE" ON ("FILE_CONTENT")
ASYNCHRONOUS
LANGUAGE DETECTION ('en')
MIME TYPE COLUMN "FILE_TYPE"
FUZZY SEARCH INDEX off
PHRASE INDEX RATIO 0.721
SEARCH ONLY OFF
FAST PREPROCESS OFF
TEXT ANALYSIS ON
CONFIGURATION 'GRAMMATICAL_ROLE_ANALYSIS';
};
FILE_CONTENT
column as a LargeBinary
, and the FULLTEXT INDEX
definition on that column..hdbdd
table definition is described SAP HANA Core Data Services (CDS) Reference, whereas the actual options that are applicable for FULLTEXT INDEX
es are described in the SAP HANA SQL and System Views Reference. Finally, guidance on the meaning and functionality of the text analysis configurations are described in the SAP HANA Text Analysis Developer Guide.TEXT_CONTENT
column will be analyzed automatically. The results of the analysis will be stored in a sepearate $TA_
table called $TA_system-local.public.rbouman.ta.db::CT_FILE.FT_IDX_CT_FILE
. This table is created and maintained by the HANA system. The structure of this $TA_
table is described here.ASYNCHRONOUS
$TA_
table automatically. While this is true, the analysis does not occur immediately. This is because the FULLTEXT INDEX
is created with the ASYNCHRONOUS
option. This allows HANA to store documents in the CT_FILE
table without having to wait for the text analysis process to finish.ASYNCHRONOUS
option and whether it would make more sense to specify SYNCHRONOUS
instead, or leave the option out alltogether (in which case SYNCHRONOUS
would be implied). However there is a very simple reason why it is currently specified as ASYNCHRONOUS
: if a FULLTEXT INDEX
specifies a CONFIGURATION
option, then it must be specified as SYNCHRONOUS
, or else the following error occurs upon activation: CONFIGURATION not supported with synchronous processing
CONFIGURATION
option, as it offers all the truly interesting properties of text analysis. So, it seems there's just no way around it - text analysis results are collected in the background, and finish at some point after our document is stored in the table. And there seems to be no way of finding out whether the analysis is finished, or even if it is still busy. For instance, this makes it impossible to determine whether a recently uploaded document is still being analyzed, or whether the document was not eligible for text analysis at all: in both cases, the analysis results will remain absent.FULLTEXT INDEX
is specified as ASYNCHRONOUS
HANA will let you specify when the analysis results should be updated. At least, according to the SAP HANA SQL and System Views Reference, it is possible to specify a FLUSH [QUEUE] <flush_queue_elem>
-clause right after the ASYNCHRONOUS
option, with <flush_queue_elem>
indicating either a time interval (expressed as a number of minutes) or a number of documents. So, in theory, it would be possible to write:ASYNCHRONOUS FLUSH QUEUE AFTER 1 DOCUMENTS
which would indicate that update of the analysis would kick in as soon as a new document has been loaded.Flush based on documents/minutes not yet supported
ASYNCHRONOUS FLUSH QUEUE EVERY 1 MINUTES
instead.CT_FILE
table. For this very simple demo application, we chose to make only the FILE_NAME
column the primary key of the table. The choice of key will depend on what kind of application you're building. In many practical cases you might not care at all about the fysical name of the uploaded file at all, and a name given by the uploader might be a better choice. Or maybe you don't care about names at all, only about whether the content of the document may be considered unique, in which case some hash of the file contents may be a suitable choice.FULLTEXT INDEX
ed table, the column definitions that make up the key are copied to the corresponding $TA_
table in order to maintain the relationship between the analysis result and the original source document of the analysis, thus using those columns as a foreign key. Note however that HANA does not automatically create a FOREIGN KEY
constraint to enforce referential integrity. But you may add such a constraint yourself. This may be useful in particular to cascade deletes on the document table to the text analysis results table. $TA_
table consists of the key columns from the document table, plus to additional columns that identify an individual analysis result: TA_RULE
and TA_COUNTER
, where TA_RULE
is the type of analysis that yielded the result, and TA_COUNTER
is an integer that identifies, in order, each analysis result within a document and within a particular analysis type..xsodata
Service DefinitionCT_TEXT
and the $TA_system-local.public.rbouman.ta.db::CT_FILE.FT_IDX_CT_FILE
tables via an OData service. The OData service is created by creating a .xsodata
service definition file called ta.xsodata
in the service
subpackage.ta.xsodata
service definition file are shown below:service {
entity "system-local.public.rbouman.ta.db::CT_FILE" as "Files";
entity "RBOUMAN"."$TA_system-local.public.rbouman.ta.db::CT_FILE.FT_IDX_CT_FILE" as "TextAnalysis";
}
annotations {
enable OData4SAP;
}
settings {
support null;
}
CT_FILE
and the $TA_
table to the OData EntitySets Files
and TextAnalysis
respectively. The OData service will be available at a url of which the path corresponds to the fully qualified package name of the .xsodata
file.CT_FILE
, it is the package name containing the table's .hdbdd
file, followed by 2 colons, and then followed by the local table name.$TA_
table, it is the (quoted) database schema name, followed by a dot, and them followed by the quoted table name.CT_FILE
table as a repository object. There is no corresponding repository object for the $TA_
file, since HANA creates that autonomously as a result of the full text index on CT_FILE
. Since the $TA_
table is created automatically we can assume the entire thing is transportable as it is, as long as we make sure we maintain our document table as a repository object, and refer to it in the .xsodata
file using it's repository object name..hdbdd
file before attempting to activate the .xsodata
file, because the .xsodata
file is dependent upon the existence of the tables in the database catalog.http://yourhanahost:yourxsport/path/to/your/package/service/ta.xsodata
http://yourhanahost:yourxsport/path/to/your/package/service/ta.xsodata/$metadata
yourhanahost
is the hostname or ipadress of your SAP HANA systemyourxsport
is the portname of your xsengine. This is normally 80 followed by the HANA Instance number. For example, if the instance number is 10, the port will be 8010path/to/your/package
is the path you get when you take the package identifier where you put the db
, service
and web
subpackages and replace the dot (.) that separates the individual package names with a slash (/).$metadata
document of the service$metadata
document should look something like this:
<edmx:Edmx
xmlns:edmx="http://schemas.microsoft.com/ado/2007/06/edmx"
xmlns:sap="http://www.sap.com/Protocols/SAPData"
Version="1.0"
>
<edmx:DataServices
xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata"
m:DataServiceVersion="2.0"
>
<Schema
xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices"
xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata"
xmlns="http://schemas.microsoft.com/ado/2008/09/edm"
Namespace="system-local.public.rbouman.ta.service.ta"
>
<EntityType Name="FilesType">
<Key>
<PropertyRef Name="FILE_NAME"/>
</Key>
<Property Name="FILE_NAME" Type="Edm.String" Nullable="false" MaxLength="256"/>
<Property Name="FILE_TYPE" Type="Edm.String" Nullable="false" MaxLength="256"/>
<Property Name="FILE_LAST_MODIFIED" Type="Edm.DateTime" Nullable="false"/>
<Property Name="FILE_SIZE" Type="Edm.Int32" Nullable="false"/>
<Property Name="FILE_CONTENT" Type="Edm.Binary" Nullable="false"/>
<Property Name="FILE_LAST_UPLOADED" Type="Edm.DateTime" Nullable="false"/>
</EntityType>
<EntityType Name="TextAnalysisType">
<Key>
<PropertyRef Name="FILE_NAME"/>
<PropertyRef Name="TA_RULE"/>
<PropertyRef Name="TA_COUNTER"/>
</Key>
<Property Name="FILE_NAME" Type="Edm.String" Nullable="false" MaxLength="256"/>
<Property Name="TA_RULE" Type="Edm.String" Nullable="false" MaxLength="200"/>
<Property Name="TA_COUNTER" Type="Edm.Int64" Nullable="false"/>
<Property Name="TA_TOKEN" Type="Edm.String" MaxLength="5000"/>
<Property Name="TA_LANGUAGE" Type="Edm.String" MaxLength="2"/>
<Property Name="TA_TYPE" Type="Edm.String" MaxLength="100"/>
<Property Name="TA_NORMALIZED" Type="Edm.String" MaxLength="5000"/>
<Property Name="TA_STEM" Type="Edm.String" MaxLength="5000"/>
<Property Name="TA_PARAGRAPH" Type="Edm.Int32"/>
<Property Name="TA_SENTENCE" Type="Edm.Int32"/>
<Property Name="TA_CREATED_AT" Type="Edm.DateTime"/>
<Property Name="TA_OFFSET" Type="Edm.Int64"/>
<Property Name="TA_PARENT" Type="Edm.Int64"/>
</EntityType>
<EntityContainer
Name="ta"
m:IsDefaultEntityContainer="true"
>
<EntitySet Name="Files" EntityType="system-local.public.rbouman.ta.service.ta.FilesType"/>
<EntitySet Name="TextAnalysis" EntityType="system-local.public.rbouman.ta.service.ta.TextAnalysisType"/>
</EntityContainer>
</Schema>
</edmx:DataServices>
</edmx:Edmx>
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
11 | |
9 | |
7 | |
6 | |
4 | |
4 | |
3 | |
3 | |
3 | |
3 |