Improved Search Performance in SAP BTP Core, SAP C...

Arley · ‎08-18-2021

Introduction

In SAP Cloud Application Programming Model CDS version 5, we’ve enabled a new search capability in the Node.js Runtime that:

Allows you to tailor the search behavior to your needs via the new supported @cds.search annotation

Improve the performance of search requests by favoring performance over convenience — only elements typed as String are searchable by default.

Layout the foundation for deep search — search on entity level and substructures — and fuzzy search

In addition, the following improvements were made:

Refactored and modularized the search implementation to decouple database-specific processing from the OData service layer

Caching of metadata

Improved performance of search queries for localized data.

Tailored Search with `@CDS.search` Annotation

The new search capability supports the @cds.search entity-level annotation to allow you to include or exclude elements from the set of searchable elements in search queries. The annotation can contain paths to elements, including elements of an associated entity using the path expression syntax. Just annotate those elements, which shall be searched, and which not.

In contrast to cds version 4 — to favor performance over convenience — only elements typed as String are searchable by default unless you decide to override this default. The reasoning behind this design decision is that most of the time, you’re searching for elements typed as String rather than for elements typed as UUID, Date, Time, DateTime, or Timestamp.

The new mantra is that you’ve to be explicit about what is searchable so that we can simplify and optimize the generated SQL queries for the underlying database, thus improving performance.

Common Usages

The `@CDS.search` annotation looks as follows:

@cds.search: {

    element1,

    element2: true,

    element3: false,

    element4.elementA

}

entity MyEntity ...

Here element1, ..., element4 are the element names of an annotated entity. To exclude an element from the search, annotate it with false.
To make an element (typed other than String) searchable via the $search query option, the element's entity must be explicitly annotated with the @CDS.search annotation, for example:

Search by `ID` and `title`:

@cds.search: { ID, title }

entity Books {

    key ID: UUID;

    title: String;

}

Search by all elements of type `String`, `title`, `descr`, and `isbn` — default, no annotation required:



entity Books {

  key ID: UUID;

  title: String;

  descr: String;

  isbn: String;

  author: Association to Authors;

}

Search by `title`:



@cds.search: { title }

entity Books { ... }



@cds.search: { title: true }

entity Books { ... }

Search by `title` and `descr`:



@cds.search: { title, descr }

entity Books {

  key ID: UUID;

  title: String;

  descr: String;

  isbn: String;

  author: Association to Authors;

}

Search by `title`, `descr`, and `isbn`:



@cds.search: { title, descr, isbn }

entity Books {

  key ID: UUID;

  title: String;

  descr: String;

  isbn: String;

  author: Association to Authors;

}

Search by all elements of type `String` excluding the element `isbn` (search by `title` and `descr`😞



@cds.search: { isbn: false }

entity Books {

  key ID: UUID;

  title: String;

  descr: String;

  isbn: String;

  author: Association to Authors;

}

Search by `ID` and `title`:



@cds.search: {

  ID: true,

  title: true,

  descr: false // superfluo (just for demostration purposes)

}

entity Books {

  key ID: UUID;

  title: String;

  descr: String;

  isbn: String;

  author: Association to Authors;

}

Note: the element-level annotation @Search.defaultSearchElement is deprecated and will be removed in future versions. In the meantime, @CDS.search supersedes @Search.defaultSearchElement. That is, if both annotations are defined, @CDS.search wins.

Metadata Caching

The metadata information that defines which elements are searchable is based on Core Data Services. We use the SAP Cloud Application Programming Model compact representation of data models known as Core Schema Notation to determine which elements are searchable for an entity at runtime.
Based on the premise that Core Schema Notation is a compact static data structure representation of CDS models. We implemented a caching mechanism that lazily computes the searchable elements and caches the result so that future search requests for that entity data are served up faster.

OData Service Layer Refactored

There are scenarios in which search requests are forwarded to external services and therefore don’t require database-specific processing on the primary service that processes and forwards the search request. In CDS version 4, some parts of the database handling were tightly coupled to the OData service layer.
As an undesired side effect, in the worst-case scenario, the runtime couldn’t fulfill search requests because of unexpected conditions, or the exact database processing was done twice (once for the primary service and once for the external service).

In CDS version 5, we refactored and modularized the $search system query option to decouple database-specific processing from the OData service layer, thus resulting in better performance when forwarding search requests to external services and fewer runtime exceptions.

Improved Search Performance for Localized Data

In the SAP Cloud Application Programming Model, there are two approaches for resolving translated texts:

Via localized views (former approach)

At runtime

Resolving Localized Data Via Localized Views

In the former approach, when a data model is compiled to SQL format, localized views are generated, and search queries are redirected to those views based on the user locale.
The user locale is advertised by HTTP clients, for example, via the Accept-Language header in HTTP requests or via the sap-language | x-sap-request-language URL’s parameters. Once the Node.js Runtime by content negotiation selects a user locale proposal, it uses the corresponding ahead-of-time compiled localized view with the user locale to resolve the localized texts.

For example, given the following service definition:

service BookshopService {



  // search by title (default)

  @cds.search: { title }

  entity Books {

    key ID : UUID;

    title  : localized String;

  }

}

After compiling the BookshopService service definition to SQL format, the BookshopService_Books, BookshopService_Books_texts database tables are generated.
Notice that the generated BookshopService_Books_texts table doesn’t have a counterpart in the service definition. By convention, when an entity contains localized elements, the compiler generates an additional table with the same name suffixed with _texts, including the following columns:

An ID column

A locale column

A column for every localized element defined in the root entity (Books in this case)

In addition, the localized_BookshopService_Books view is generated in SQL DDL to read localized data with fallback easily.

╰─$ cds compile srv/books.cds --to hdbtable

COLUMN TABLE BookshopService_Books (

  ID NVARCHAR(36) NOT NULL,

  title NVARCHAR(5000),

  PRIMARY KEY(ID)

) WITH ASSOCIATIONS (

  MANY TO MANY JOIN BookshopService_Books_texts AS texts ON (texts.ID = ID),

  MANY TO ONE JOIN BookshopService_Books_texts AS localized ON (localized.ID = ID

  AND localized.locale = SESSION_CONTEXT('LOCALE'))

)



COLUMN TABLE BookshopService_Books_texts (

  locale NVARCHAR(14) NOT NULL,

  ID NVARCHAR(36) NOT NULL,

  title NVARCHAR(5000),

  PRIMARY KEY(locale, ID)

)



VIEW localized_BookshopService_Books AS SELECT

  L_0.ID,

  coalesce(localized_1.title, L_0.title) AS title

FROM (

  BookshopService_Books AS L_0

  LEFT JOIN BookshopService_Books_texts AS localized_1

  ON localized_1.ID = L_0.ID

  AND localized_1.locale = SESSION_CONTEXT('LOCALE')

)

WITH ASSOCIATIONS (

  MANY TO MANY JOIN BookshopService_Books_texts AS texts ON (texts.ID = ID),

  MANY TO ONE JOIN BookshopService_Books_texts AS localized ON (localized.ID = ID

  AND localized.locale = SESSION_CONTEXT('LOCALE'))

)

Notice that the generated SQL DDL used to read localized texts conveniently uses the coalesce function to filter the localized title field.

According to research from the colleagues of the SAP Cloud Application Programming Model Java Runtime and robinjayasinghe, the coalesce function prevents push-down of filter operations to the SAP HANA database column store layer. As a result, search queries over large data sets show suboptimal performance and might lead to out-of-memory issues in some high-load scenarios.

Although the approach with the localized views is convenient, it’s limited on SQLite, and on SAP HANA database shows suboptimal performance with large data sets and last but not least prevents fuzzy search capabilities.
Thus, the alternative is to resolve localized data at runtime.

Resolve Localized Data At Runtime

To improve the performance of search queries over localized data, we implemented a mechanism to resolve localized data at runtime rather than over the localized view.

Note: This is an experimental feature toggle that can be enabled by setting the environment variable cds.env.features.optimized_search to true.

We use the localized association in schema notation format to generate SQL statements that resolve the localized data in a way that is optimized for the underlying database. This is, however, only possible if the localized association is present, with is always the case when at least one localized element is defined in the target entity.

Compiling a sample `Books` model to schema notation:

╰─$ cds compile srv/books.cds

{

  "definitions": {

    "BookshopService": {

      "kind": "service"

    },



    "BookshopService.Books": {

      "kind": "entity",

      "@cds.search.title": true,



      "elements": {

        "ID": {

          "key": true,

          "type": "cds.UUID"

        },

        "title": {

          "localized": true,

          "type": "cds.String"

        },



        "texts": { // points to all translated texts for the given entity

          "type": "cds.Composition",

          "cardinality": {

            "max": "*"

          },

          "target": "BookshopService.Books.texts",

          "on": [

            {

              "ref": [

                "texts",

                "ID"

              ]

            },

            "=",

            {

              "ref": [

                "ID"

              ]

            }

          ]

        },



        "localized": { // association points to the translated texts and is narrowed to the request’s locale

          "type": "cds.Association",

          "target": "BookshopService.Books.texts",

          "on": [

            {

              "ref": [

                "localized",

                "ID"

              ]

            },

            "=",

            {

              "ref": [

                "ID"

              ]

            },

            "and",

            {

              "ref": [

                "localized",

                "locale"

              ]

            },

            "=",

            {

              "ref": [

                "$user",

                "locale"

              ]

            }

          ]

        }





      }

    },



    "BookshopService.Books.texts": {

      "kind": "entity",

      "@odata.draft.enabled": false,

      "elements": {

        "locale": {

          "key": true,

          "type": "cds.String",

          "length": 14

        },

        "ID": {

          "@odata.containment.ignore": true,

          "key": true,

          "type": "cds.UUID"

        },

        "title": {

          "localized": false,

          "type": "cds.String"

        }

      }

    }



  },

  "meta": {

    "creator": "CDS Compiler v2.2.0",

    "flavor": "inferred"

  },

  "$version": "2.0"

}

Notice that the localized association points to the translated texts and is additionally narrowed to the request’s locale. In addition, the texts composition points to all translated texts for the Books entity.
Utilizing these schema data structures and the user locale, we join the BookshopService_Books and BookshopService_Books_texts database tables at runtime, avoiding the performance overhead of the coalesce function when filtering localized data.

Search Queries Based on the `CONTAINS` Predicate

In contrast to the LIKE predicate, you can use the CONTAINS predicate in the WHERE clause of a SELECT statement to search for one or multiple columns tables.

The syntax of the CONTAINS predicate looks as follows:

CONTAINS (<contains_columns>, <search_string> [, <search_specifier>])

Similar to an Internet search engine, you can search for multiple terms in the same query, for example:

SELECT * FROM <table_name>

WHERE CONTAINS ((<column1>, <column2>, <column3>), <search_string>)

According to our research, the LIKE clause might cause a high CPU load on SAP HANA servers in some scenarios because the database optimizer can’t push down the LIKE expression to the column storage layer.
As a result, search queries over large data set are slower than search queries based on CONTAINS.

To further improve the performance of search queries, we decided to base the $search implementation on CONTAINS. Unfortunately, search queries based on CONTAINS aren’t strictly comparable to those based on LIKE (see CONTAINS limitations). So, we implemented a branch prediction algorithm to determine whether to use CONTAINS at runtime for search queries instead of LIKE.

Note: This is an experimental feature toggle that can be enabled by setting the environment variable cds.env.features.optimized_search to true.

Key Takeaways

In SAP Cloud Application Programming Model CDS version 5, we improved the performance of search queries (some optimizations are enabled by default, others can be enabled by setting the environment variable cds.env.features.optimized_search to true) and provided support for the @cds.search entity-level annotation.

To ask questions about the SAP Cloud Application Programming Model in SAP Community, you can use this Q&A tag link.