Most of people have noticed that Platform Search application works differently in Business Intelligence Platform (BI) 4.x comparing to previous release. The architecture of Platform Search has been changed significantly since BI 4.0. It provides scalable and flexible Business Objects content indexing and search infrastructure support for different proprietary BOE content types. It can be set to real time indexing, so that the user is not required to restart the Indexing every time when he wants latest indexing content. When the documents are published/modified/deleted in the repository, the application identifies those documents and they will be indexed. Alternatively, it can be set to schedule based indexing which will trigger the indexing based on the schedule time. In either way, the user can perform searching in BI Launchpad while indexing is happening. Platform Search also supports load balancing and failover for both indexing and searching in a clustered environment.

  Platform Search service is the service in the Adaptive Processing Server, which has the logic to index the BOE content and search the content. It uses Apache Lucene, a free open source information retrieval software library from Apache Software Foundation. The version of Apache Lucene currently used by BI 4.0 and BI 4.1 is 2.4.1.

  The functionality of the Platform Search service can be divided as Indexing and Searching. Before the content becomes searchable, the content needs to be indexed. In a large sized system with a large number of infoobjects, getting all the infoobjects fully indexed first time can be time consuming because indexing involves several sequential tasks. I will talk about the indexing process in this blog.

Indexing Process

Indexing is a continuous process that involves the following sequential tasks:

1.     Use Crawling mechanism to poll the CMS repository and identifies objects that are published, modified, or deleted. It can be done in two ways: continuous and scheduled crawling.

2.     Use Extracting mechanism to call the extractors based upon the document type. There is a dedicated extractor for every document type that is available in the repository. There are following extractors:

        • Metadata Extractor
        • Crystal Reports Extractor
        • Web Intelligence Extractor
        • Universe Extractor
        • BI Workspace
        • Agnostic Extractor (Microsoft Word/Excel/PPT, Text, RTF, PDF)

3.      Use Indexing mechanism to index all the extracted content through the third-party library, Apache Lucene Engine. The time required for indexing varies, depending on the number of objects in the system, and the size and type of documents. It involves the following steps:

    1. The extracted content will be stored in the local file system (<BI 4 Install folder>\Data\PlatformSearchData\workplace\Temporary Surrogate Files) in an xml format called as Surrogate files.
    2. These surrogate files will be uploaded to Input File Repository Server (FRS) and will be removed from the local file system.
    3. The content of the surrogate files will be read and will be indexed by using specific index Engine into temporary location called as Delta Indexing Area (<BI 4 Install folder>\Data\PlatformSearchData\workplace\DeltaIndexes).
    4. The Delta index will be uploaded to Input FRS and will be deleted from the local file system.
    5. The Delta Index will be read and will be merged into Master Indexed Area (<BI 4 Install folder>\Data\PlatformSearchData\Lucene Index Engine\index) which is the final indexed area in the local file system.

         For indexing to run successfully, the following servers must be running and enabled:

        • InputFileRepositoryServer
        • OutputFileRepositoryServer
        • CentralManagementServer
        • AdaptiveProcessingServer with Platform search service on
        • AdaptiveJobServer (scheduled crawling)
        • WebIntelligenceProcessingServer (content type is selected as Web Intelligence)
        • CrystalReportApplicationServer (content type is selected as Crystal Reports)

4.      Generating Content Store and Speller/Suggestions

         After completing the Indexing task the following things will be generated:

        • Content Store: The content store contains information such as id, cuid, name, kind, and instance extracted from the master index in a format that can be read easily. This helps to quicken the search.

Each AdaptiveProcessingServer creates its own content store (<BI 4 Install folder>\Data\PlatformSearchData\workplace\<NodeName>.AdaptiveProcessingServer\

ContentStores)

        • Speller/Suggestions: The similar words will be created from the master indexed data and will be indexed. The speller folder will be created under “Lucene Index Engine” folder (<BI 4 Install folder>\Data\PlatformSearchData\Lucene Index Engine\speller)

Platform Search Queues

  Internally, above indexing sequential tasks are handled by Platform Search Queues. When Indexing is started, an infoobject would eventually go through the following queues in this order:

To Be Extracted > Under Extraction > To Be Indexed > Indexing > Delta Index To Be Merged > Content Store Merge

If multiple Platform Search Services exist, there is only one To Be Extracted, To Be Indexed, Delta Index To Be Merged and Content Store Merge queue for all nodes. But each Platform Search Service has its own Under Extraction Queue and Indexing Queue. Only one Platform Search Service will be designated as the master service to do delta index merge into master index.

  Each Platform Search Queue itself is an infoobject, the status of each Platform Search Queue can be retrieved by running the following query in the Query Builder:

SELECT * FROM CI_INFOOBJECTS,CI_APPOBJECTS,CI_SYSTEMOBJECTS WHERE SI_KIND = ‘PlatformSearchQueue’

It will return the results with the following SI_NAMEs:

  • Platform Search (Delta Index To Be Merged) Queue
  • Platform Search (To Be Indexed) Queue
  • Platform Search (To Be Extracted) Queue
  • Platform Search (Exclude Documents) Queue
  • Platform Search (Include Documents) Queue
  • Platform Search Content Store Merge Queue
  • Platform Search (Under Extraction – Enity – AcpzqPRw1thIk_GYPiEETF8)
  • Platform Search (Indexing – Enity – AcpzqPRw1thIk_GYPiEETF8)

You will find a property called SI_PLATFORM_SEARCH_OBJECTS in each queue. That property displays the number of objects being processed in that queue. If SI_TOTAL of that property displays 0, it means that queue is empty.

  Exclude Documents and Include Documents are two special Queues to handle the exclude documents. When you update the exclude documents in CMC > Applications > Platform Search Application > Properties > Documents Excluded from Indexing, the documents will be added to the Platform Search
(Exclude Documents) Queue
.  When infoobjects are extracted, they will be excluded.

  When you remove the exclude documents in CMC > Applications > Platform Search Application > Properties > Documents Excluded from Indexing, the documents will be removed from exclude documents queue and added to the Platform Search (Include Documents) Queue. The crawling will only add documents to be extracted queue if only there is modification for the infoobject and its content or it is a new infoobject. In the case of those infoobjects removed from the exclude documents, they are neither new infoobject, nor modified, so they won’t be picked up by crawling. They are added to this special queue, so that they will be added to the To Be Extracted queue.

  From the Platform Search Queues result, you can see that Under Extraction and Indexing Queues are associated with a Platform Search Service session SI_CUID because each Platform Search Service has its own Under Extraction Queue and Indexing Queue. The information of Platform Search Service Sessions can be retrieved by running the following query in the Query Builder:

SELECT * FROM CI_INFOOBJECTS,CI_APPOBJECTS,CI_SYSTEMOBJECTS WHERE SI_KIND = ‘PlatformSearchServiceSession’

Each Platform Search service should have one session. If the heartbeat (SI_PLATFORM_SEARCH_HEARTBEAT_TIMESTAMP) isn’t updated regularly on one session, other search service would try to return the hung service’s objects to the previous queue and take over unfinished work.

Here are some other useful queries you can run to get information regarding Platform Search Application.

Retrieving the general information about Platform Search Application

SELECT * FROM CI_INFOOBJECTS,CI_APPOBJECTS,CI_SYSTEMOBJECTS WHERE SI_KIND = ‘PlatformSearchApplication’

The property SI_PLATFORM_SEARCH_SERVICE_CONTEXT_ACTION shows if the indexing is running. 0 means Indexing is not running, 1 means Indexing is running.

Retrieving the information of Platform Search Application Status

SELECT * FROM CI_INFOOBJECTS,CI_APPOBJECTS,CI_SYSTEMOBJECTS WHERE SI_KIND = ‘PlatformSearchApplicationStatus’

For example, you can check the following properties:

  • SI_PLATFORM_SEARCH_LAST_TO_BE_EXTRACTED_DAILY_MAX_OBJECT_ID
  • SI_PLATFORM_SEARCH_LAST_TO_BE_EXTRACTED_ID
  • SI_PLATFORM_SEARCH_LAST_TO_BE_EXTRACTED_MAX_ID
  • SI_PLATFORM_SEARCH_LAST_TO_BE_EXTRACTED_MAX_FOLDER_ID
  • SI_PLATFORM_SEARCH_LAST_TO_BE_EXTRACTED_UNIVERSE_ID
  • SI_PLATFORM_SEARCH_LAST_TO_BE_EXTRACTED_TIMESTAMP

SI_PLATFORM_SEARCH_LAST_TO_BE_EXTRACTED_MAX_ID represents the SI_ID of the last infoobject which was added to the To Be Extracted queue. The infoobjects are added to the To Be Extracted queue in the batches. So if we have a batch of 100 infoobjects which are added in the To Be Extracted queue, this field will have the max SI_ID among the SI_IDs of those infoobjects.

SI_PLATFORM_SEARCH_LAST_TO_BE_EXTRACTED_ID represents the SI_ID of the last infoobject which was added to the To Be Indexed queue. When indexing starts, this field will have the same value as SI_PLATFORM_SEARCH_LAST_TO_BE_EXTRACTED_MAX_ID. But during the indexing if some infoobjects didn’t get added to the To Be Indexed queue, then this field is updated with the max SI_ID of the infoobjects which actually got added to the To Be Indexed queue. And SI_PLATFORM_SEARCH_LAST_TO_BE_EXTRACTED_MAX_ID field is retained with the original value. For both these fields, the SI_IDs of folders are not included.

  For the definition of above properties related to Platform Search, please use the latest release of the SAP BI Platform Support Tool. A new report option has been added in BI Platform Support Tool that will provide detailed information on the Platform Search and how it is performing.

BISupportTool.png

  I hope this blog helps you to understand how Platform Search Indexing works.

To report this post you need to login first.

16 Comments

You must be Logged on to comment or reply to a post.

  1. Jay Riddle

    Lei Liu,

    In a clustered environment where all the necessary services are on Server A and Server B, and both A and B share a Master Index located on a network share, and both Server A and B have their workplace folders stored on the local disk —

    If the DeltaIndexes folder on Server A is hovering at 5GB, but the DeltaIndexes folder on Server B never appears to get above 0GB; can the files under the “DeltaIndexes” folder be safely deleted on Server A as part of local disk space maintenance activities, or will this cause an issue?  And which is the likely node to be serving as the master indexer in this scenario?

    It’s a bit tricky to try and find the best way to get my point across, but essentially I’m curious if it’s safe to purge the local DeltaIndexes folder on a server in a clustered environment where it doesn’t appear to be offloading onto the Input FRS any longer.

    Cheers,

    Jay

    (0) 
    1. Lei Liu Post author

      Hi Jay,

      If you see the files under the “DeltaIndexes” folder have old timestamps and never gets uploaded to the FRS, then I think the indexing was interrupted due to some reason. You can delete those files but I also suggest you to rebuild index because there might be some infoobjects didn’t get added to the master index location.

      And the master node which merges the DeltaIndex files is determined internally. If you have high level trace turned on for the PlatformSearchService AdaptiveProcessingServer, then we can tell from the trace log.

      Hope this helps.

      Thanks & regards,

      Lei

      (0) 
  2. Adnan Fida

    This is very helpful information. Thanks for sharing. In our 4.1 SP5 cluster we have two search servers configured.

    We are seeing very slow search performance for CMC Personal Folders when search folders by username. After turning off the search servers momentarily, it appeared that CMC search continued to work (even though it is slow). Does this mean CMC does not use the search application? Where should we be looking to tune the CMC search performance for areas such as Personal Folders, Instance Manager, etc.

    Thanks.

    Adnan

    (0) 
    1. Lei Liu Post author

      Hi Adnan,

      Searching in CMC does not use the same engine as the Platform Search in BI Launchpad. A search in CMC does not leverage indexes and a query is sent to the CMS, and then onto the CMS database. So the search performance in CMC depends on your CMS database performance and the size of CMS database. Please check Knowledge Base article 1985200 to understand the difference between CMC search and Platform Search.

      Hope this helps.

      Thanks,

      Lei

      (0) 
  3. Nawale EL MAAZI

    Hi eveyone,

    I have a question about the Search Engine.

    We have a user with the login “Toto” (AD account), she recently got married and change her login to “Titi” (another AD account). She had many reports with SI_OWNER and SI_AUTHOR set to “Toto”.

    With an additional product, we were able to change the SI_OWNER of those reports to “Titi”.

    After switching the owner, we remove the user “Toto” from BO (From AD Group mapped into BO).

    Then, we rebuilt the index.

    After that, when we try to search “Toto” in BILaunchPad, no reports are displayed. However when we search the name of an older report created by “Toto”, the search engine indicate under the search bar that the author is “Toto”.

    When we launch the query Select * from CI_INFOOBJECTS where SI_ID=<ID Document>, SI_AUTHOR is set to “Toto” and SI_OWNER to “Titi”.

    Does the search engine is based on ID or on parsing the Infoobjects ?

    Thank you for your help.

    (0) 
    1. Lei Liu Post author

      Hi,

      This is an expected behavior. After you deleted the user, you shouldn’t find any result when you search using that user name. But author was indexed along with the document so you will find the author information when you search the document.

      Thanks,

      Lei

      (0) 
  4. Nathan Hardman

    Great article.

    We are running BI 4.1 SP06

    We have selected Level of indexing = “Platform Metadata” and Content Type = “Crystal” and “Web Intelligence”

    My question is: Does the indexing process include report history (instances) or does it only index the report templates?

    (0) 
    1. Shreejith Nair

      Hi Nathan,

      -It includes the instances, whenever you search for a particular report it will show the actual report and also give you the option ‘show instances’

      -Click on the option and you can see the instances.

      Regards,

      Shreejith

      (0) 
    2. Lei Liu Post author

      Thanks Shreejith. In addition to Shreejith’s reply, now we have a new feature to skip report instances from indexing if you apply the latest patch. Please check SAP note 2183804.

      Also the filter you selected in Content Types will not apply to Platform Metadata indexing, it only applies to Document Metadata and Document content. Now we also have a new feature to exclude infoobjects from indexing based on SI_KIND, this will apply to Platform Metadata as well. For details, please check SAP note 2235653. The users and user groups can also be excluded with this new feature.

      Thanks & regards,

      Lei

      (0) 
  5. Minh La

    Does any one know where I can find more information about this topic?  We have a number of issues with Platform Search that I can’t figure out.

    TiA

    (0) 

Leave a Reply