Technology Blogs by Members
Explore a vibrant mix of technical expertise, industry insights, and tech buzz in member blogs covering SAP products, technology, and events. Get in the mix!
cancel
Showing results for 
Search instead for 
Did you mean: 
Former Member

Indexing basics

Explorer is able to Index Information Spaces based on these four types of data sources: Universes, Excel files, BW Accelerator (BWA) Indexes and HANA Views.  In the case of Universe and Excel sourced Information Space that would be considered non-accelerated since they are local to the Explorer servers and are stored on disk. They also contain all the data related to the Information Space object selection. For accelerated source, as in the case with BWA or HANA source, the index local to the Explorer servers only contains the metadata and the actual index data is loaded in-memory in the appropriate connected system. The Explorer indexing process is sequential, meaning that only one job runs at any given time. Staggered scheduling for indexing is recommended if source data changes often and if there are a lot of Information Spaces to be indexed. If some parallel processing is desired then more than one Indexing server can be created providing the host server can accommodate.

Location of Indexes

In CMC, when inspecting the Properties of any of the producing servers (Indexing) or consuming (Exploration or Search) the default parent folder for Index location by default:  “%DefaultDataDir%/Polestar/index”

The Placeholder for “%DefaultDataDir%” is  by default:

“C:/Program Files (x86)/SAP BusinessObjects/SAP BusinessObjects Enterprise XI 4.0/Data/”

More accurately this information can be obtained from CMC -> Servers –> Nodes ,  right-click on the Node name and select Placeholders. There the default data directory will show the absolute path:


Each of the three Explorer servers (Indexing,Exploration and Search) will have their own subdirectory, named based on their initial names at install time, ex: <SIA_NAME>.ExplorerIndexingServer. If we create a secondary server regardless of the new name, its directory will be given the original server name appended by a number. For example if we create a new Indexing server: <SIA_NAME>. NewIndexingServer the folder will be called <SIA_NAME>.ExplorerIndexingServer1.

When the Information Space indexing is kicked off from Explorer manually or via a schedule, the initial index, while in progress, is being built under the “ExplorerIndexing/InProgress/ExplorationIndexes”  sub-directory and when done it will be moved to “ExplorerIndexing/Published/ExplorationIndexes” folder. It will also be automatically copied under the “Published/ExplorationIndexes” subdirectory of the Exploration and Search server for consumption by these servers. In a clustered environment where there exists more than one Indexing server, regardless of which Indexing server has processed an Indexing job, the new index will be replicated to all other nodes and automatically copied to the each Indexing servers’ local folder “ExplorerIndexing/Published/ExplorationIndexes” 

Index Information

Furthermore, each Index will be given a specific ID which corresponds to the folder name created under each of the “ExplorationIndexes” directories in a timestamped folder name.  The Information Space details itself are stored in the CMS database and this will have its own Unique Identifier (CUID), as seen from BI Launchpad navigation to the Properties of a specific Information Space. This CUID is not used as folder name where the index is stored on disk. More information can be obtained by viewing the “DataSourceDescriptor” and “ExplorationSpaceDescriptor” files. If the index is based BWA or HANA only the latter of the two files will be generated. To get more detail  about an Information Space and another way to find out which Information Space index corresponds to which folder name on the disk we can use Query Builder to query the CMS database. This is accessible via:  http://<server>:<port>/AdminTools

To find out we can run this query:

SELECT SI_ID, SI_NAME, SI_CONTENT FROM CI_INFOOBJECTS WHERE SI_KIND = 'DataDiscovery'

SI_NAME = contains the name of the Information Space

SI_CONTENT = shows the properties of the Information Space. What we are interested in from this section is the “id” property for example in this case id=" 6a911b29-ac69-4008-a971-780e37222cd2" which will map to the index folder name created for this particular Information Space.

LUKE

Since the indexes are based on the Apache Lucene technology,  LUKE is a useful tool that allows more inspection of such indexes.  It is a self-contained jar file and can be downloaded from:

http://code.google.com/p/luke/downloads/list

If the default installation of JRE 1.6.x exists on your system then this tool can be started by simply double clicking it. You will be prompted to specify the Path to the index directory. You can also specify the Path to the index directory from File->Open Lucene index. After specifying the path we are presented with some information about the index such as number of docs, terms, fields and also some statistics information from the Overview tab.

Seen here is an Index loaded by pointing the location to the date stamp directory under the desired Index folder ID discovered from above.

By stepping through various DOC ids in the Documents tab we can obtain more information on the Index:

This tool can also be used to inspect the Platform Search Index created under:

%DefaultDataDir%\PlatformSearchData\Lucene Index Engine\index

The various actions you can perform with this tool are as outlines.

  • browse by document number, or by term
  • view documents / copy to clipboard
  • retrieve a ranked list of most frequent terms
  • execute a search, and browse the results
  • analyze search results
  • selectively delete documents from the index
  • reconstruct the original document fields, edit them and re-insert to the index
  • optimize indexes

More information on this tool can be found at:

http://www.getopt.org/luke/

http://www.ezdia.com/epad/lucene-luke-search-tutorial-indexing/1503/

More information on Lucene technology can be found at:

http://lucene.apache.org/

11 Comments
Labels in this area