Indexing basics
Explorer is able to Index Information Spaces based on these four types of data sources: Universes, Excel files, BW Accelerator (BWA) Indexes and HANA Views. In the case of Universe and Excel sourced Information Space that would be considered non-accelerated since they are local to the Explorer servers and are stored on disk. They also contain all the data related to the Information Space object selection. For accelerated source, as in the case with BWA or HANA source, the index local to the Explorer servers only contains the metadata and the actual index data is loaded in-memory in the appropriate connected system. The Explorer indexing process is sequential, meaning that only one job runs at any given time. Staggered scheduling for indexing is recommended if source data changes often and if there are a lot of Information Spaces to be indexed. If some parallel processing is desired then more than one Indexing server can be created providing the host server can accommodate.
Location of Indexes
In CMC, when inspecting the Properties of any of the producing servers (Indexing) or consuming (Exploration or Search) the default parent folder for Index location by default: “%DefaultDataDir%/Polestar/index”
The Placeholder for “%DefaultDataDir%” is by default:
“C:/Program Files (x86)/SAP BusinessObjects/SAP BusinessObjects Enterprise XI 4.0/Data/”
More accurately this information can be obtained from CMC -> Servers –> Nodes , right-click on the Node name and select Placeholders. There the default data directory will show the absolute path:
Each of the three Explorer servers (Indexing,Exploration and Search) will have their own subdirectory, named based on their initial names at install time, ex: <SIA_NAME>.ExplorerIndexingServer. If we create a secondary server regardless of the new name, its directory will be given the original server name appended by a number. For example if we create a new Indexing server: <SIA_NAME>. NewIndexingServer the folder will be called <SIA_NAME>.ExplorerIndexingServer1.
When the Information Space indexing is kicked off from Explorer manually or via a schedule, the initial index, while in progress, is being built under the “ExplorerIndexing/InProgress/ExplorationIndexes” sub-directory and when done it will be moved to “ExplorerIndexing/Published/ExplorationIndexes” folder. It will also be automatically copied under the “Published/ExplorationIndexes” subdirectory of the Exploration and Search server for consumption by these servers. In a clustered environment where there exists more than one Indexing server, regardless of which Indexing server has processed an Indexing job, the new index will be replicated to all other nodes and automatically copied to the each Indexing servers’ local folder “ExplorerIndexing/Published/ExplorationIndexes”
Index Information
Furthermore, each Index will be given a specific ID which corresponds to the folder name created under each of the “ExplorationIndexes” directories in a timestamped folder name. The Information Space details itself are stored in the CMS database and this will have its own Unique Identifier (CUID), as seen from BI Launchpad navigation to the Properties of a specific Information Space. This CUID is not used as folder name where the index is stored on disk. More information can be obtained by viewing the “DataSourceDescriptor” and “ExplorationSpaceDescriptor” files. If the index is based BWA or HANA only the latter of the two files will be generated. To get more detail about an Information Space and another way to find out which Information Space index corresponds to which folder name on the disk we can use Query Builder to query the CMS database. This is accessible via: http://<server>:<port>/AdminTools
To find out we can run this query:
SELECT SI_ID, SI_NAME, SI_CONTENT FROM CI_INFOOBJECTS WHERE SI_KIND = 'DataDiscovery'
SI_NAME = contains the name of the Information Space
SI_CONTENT = shows the properties of the Information Space. What we are interested in from this section is the “id” property for example in this case id=" 6a911b29-ac69-4008-a971-780e37222cd2" which will map to the index folder name created for this particular Information Space.
LUKE
Since the indexes are based on the Apache Lucene technology, LUKE is a useful tool that allows more inspection of such indexes. It is a self-contained jar file and can be downloaded from:
http://code.google.com/p/luke/downloads/list
If the default installation of JRE 1.6.x exists on your system then this tool can be started by simply double clicking it. You will be prompted to specify the Path to the index directory. You can also specify the Path to the index directory from File->Open Lucene index. After specifying the path we are presented with some information about the index such as number of docs, terms, fields and also some statistics information from the Overview tab.
Seen here is an Index loaded by pointing the location to the date stamp directory under the desired Index folder ID discovered from above.
By stepping through various DOC ids in the Documents tab we can obtain more information on the Index:
This tool can also be used to inspect the Platform Search Index created under:
%DefaultDataDir%\PlatformSearchData\Lucene Index Engine\index
The various actions you can perform with this tool are as outlines.
More information on this tool can be found at:
http://www.ezdia.com/epad/lucene-luke-search-tutorial-indexing/1503/
More information on Lucene technology can be found at:
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
10 | |
9 | |
5 | |
4 | |
4 | |
3 | |
3 | |
3 | |
3 | |
3 |