Skip to Content

Indexing basics

Explorer is able to Index Information Spaces based on these four types of data sources: Universes, Excel files, BW Accelerator (BWA) Indexes and HANA Views.  In the case of Universe and Excel sourced Information Space that would be considered non-accelerated since they are local to the Explorer servers and are stored on disk. They also contain all the data related to the Information Space object selection. For accelerated source, as in the case with BWA or HANA source, the index local to the Explorer servers only contains the metadata and the actual index data is loaded in-memory in the appropriate connected system. The Explorer indexing process is sequential, meaning that only one job runs at any given time. Staggered scheduling for indexing is recommended if source data changes often and if there are a lot of Information Spaces to be indexed. If some parallel processing is desired then more than one Indexing server can be created providing the host server can accommodate.

Location of Indexes

In CMC, when inspecting the Properties of any of the producing servers (Indexing) or consuming (Exploration or Search) the default parent folder for Index location by default:  “%DefaultDataDir%/Polestar/index”

Root.png

The Placeholder for “%DefaultDataDir%” is  by default:

“C:/Program Files (x86)/SAP BusinessObjects/SAP BusinessObjects Enterprise XI 4.0/Data/”

More accurately this information can be obtained from CMC -> Servers –> Nodes ,  right-click on the Node name and select Placeholders. There the default data directory will show the absolute path:

/wp-content/uploads/2012/07/data_root_124607.png

Each of the three Explorer servers (Indexing,Exploration and Search) will have their own subdirectory, named based on their initial names at install time, ex: <SIA_NAME>.ExplorerIndexingServer. If we create a secondary server regardless of the new name, its directory will be given the original server name appended by a number. For example if we create a new Indexing server: <SIA_NAME>. NewIndexingServer the folder will be called <SIA_NAME>.ExplorerIndexingServer1.

When the Information Space indexing is kicked off from Explorer manually or via a schedule, the initial index, while in progress, is being built under the “ExplorerIndexing/InProgress/ExplorationIndexes”  sub-directory and when done it will be moved to “ExplorerIndexing/Published/ExplorationIndexes” folder. It will also be automatically copied under the “Published/ExplorationIndexes” subdirectory of the Exploration and Search server for consumption by these servers. In a clustered environment where there exists more than one Indexing server, regardless of which Indexing server has processed an Indexing job, the new index will be replicated to all other nodes and automatically copied to the each Indexing servers’ local folder “ExplorerIndexing/Published/ExplorationIndexes” 

Index Information

Furthermore, each Index will be given a specific ID which corresponds to the folder name created under each of the “ExplorationIndexes” directories in a timestamped folder name.  The Information Space details itself are stored in the CMS database and this will have its own Unique Identifier (CUID), as seen from BI Launchpad navigation to the Properties of a specific Information Space. This CUID is not used as folder name where the index is stored on disk. More information can be obtained by viewing the “DataSourceDescriptor” and “ExplorationSpaceDescriptor” files. If the index is based BWA or HANA only the latter of the two files will be generated. To get more detail  about an Information Space and another way to find out which Information Space index corresponds to which folder name on the disk we can use Query Builder to query the CMS database. This is accessible via:  http://<server>:<port>/AdminTools

To find out we can run this query:

SELECT SI_ID, SI_NAME, SI_CONTENT FROM CI_INFOOBJECTS WHERE SI_KIND = ‘DataDiscovery’

QB.png

SI_NAME = contains the name of the Information Space

SI_CONTENT = shows the properties of the Information Space. What we are interested in from this section is the “id” property for example in this case id=” 6a911b29-ac69-4008-a971-780e37222cd2″ which will map to the index folder name created for this particular Information Space.

/wp-content/uploads/2012/07/output_124606.png

LUKE

Since the indexes are based on the Apache Lucene technology,  LUKE is a useful tool that allows more inspection of such indexes.  It is a self-contained jar file and can be downloaded from:

http://code.google.com/p/luke/downloads/list

If the default installation of JRE 1.6.x exists on your system then this tool can be started by simply double clicking it. You will be prompted to specify the Path to the index directory. You can also specify the Path to the index directory from File->Open Lucene index. After specifying the path we are presented with some information about the index such as number of docs, terms, fields and also some statistics information from the Overview tab.

Seen here is an Index loaded by pointing the location to the date stamp directory under the desired Index folder ID discovered from above.

Luke1.png

By stepping through various DOC ids in the Documents tab we can obtain more information on the Index:

Luke2.png

This tool can also be used to inspect the Platform Search Index created under:

%DefaultDataDir%\PlatformSearchData\Lucene Index Engine\index

The various actions you can perform with this tool are as outlines.

  • browse by document number, or by term
  • view documents / copy to clipboard
  • retrieve a ranked list of most frequent terms
  • execute a search, and browse the results
  • analyze search results
  • selectively delete documents from the index
  • reconstruct the original document fields, edit them and re-insert to the index
  • optimize indexes

More information on this tool can be found at:

http://www.getopt.org/luke/

http://www.ezdia.com/epad/lucene-luke-search-tutorial-indexing/1503/

More information on Lucene technology can be found at:

http://lucene.apache.org/

To report this post you need to login first.

11 Comments

You must be Logged on to comment or reply to a post.

  1. Patrick Delage

    So, I suppose the technical way of creating these index on server is totally different then indexing in a HANA server.  right ?  IT’s a question 🙂

    (0) 
    1. S Kaur

      It is a nice article in infospaces index.

      We have found performance issues with our infospaces in Displaying the data. Building indexes is very fast but when we run the infospace to view the data it takes more than a minute. Any ideas on what could cause this?

      (0) 
    2. George Pertea Post author

      I apologize for late reply I was away for a couple of months. The way we index for HANA source and others is the same, the only difference is that for BWA or HANA we only store the metadata in the Explorer index, hence indexing will be very fast. For HANA and BWA sources we read the data realtime from these systems (not from local index)  when exploring the Info Space. You may see “delays” with HANA source because we initially dispatch two queries when loading the Space, one for Top panel (facets) and a second one for visualization panel (Chat + table area), and they are sequential, top query must finish before the chart one runs. For every other subsequent user action new queries are dispatched so everyting is real time with HANA and BWA. The delay you are seeing is probably HANA agreggating results on billions of rows. There is caching in HANA but it’s disabled by default for Explorer because these systems are real time, meaning the data can change at times every few seconds for systems such as in Retail.

      (0) 
  2. Deepak Chodha

    Hi George,

    Thanks for sharing good information on indexing of explorer infospaces.

    I have a query. I hope may be you would be able to answer here:

    We are currently facing Explorer performance issue due to huge number of information spaces that we create in explorer.

    Problem:We usually create around 10-20 information spaces with every HANA revision internal candidate as a part of our testing. And since this is a repetitive task, so because of it we have lot of information spaces available in explorer server. Due to these many spaces, most of the times(99%)chances are when we test on our server, the usual response is time out.

    To avoid this what we can try is delete some of the information spaces(I guess so), for which we don’t have any shortcut way. We usually select one by one HANA model from ‘manage spaces’  and delete them. Is there any way using which we can delete a set of information spaces in one bulk from backend, so that we can do this job regularly to maintain explorer performance?

    What I have done is go to location “%DefaultDataDir%/Polestar/index/Search(or)Indexing(or)Exploration and from all three servers deleted the index folders which were modified in the year 2012(thinking that it will reduce the load of fetching all the information spaces). But this didn’t help as well.

    Can you suggest something, if you are aware of some kind of solution.

    HAPPY HANA 🙂

    Deepak Chodha.

    (0) 
    1. George Pertea Post author

      Hi Deepak,

      you can delete the Indexes from the the Explorer SearchServer – Published directory (responsible for Info Space listing on the Home page) but it will not actually delete the Info Space definition showing in Manage Spaces as those are stored in the CMS database. We cannot currently bulk delete these from Explorer UI, there is probably a way to do it via SDK but I don’t recommend it and not supported.

      But this is a great Explorer Enhancement request, if you could log  it in Idea Place under Explorer via: https://ideas.sap.com

      Thanks,

      George

      (0) 
  3. Abdul Haseeb

    Hi George,

    I’m having an indexing issue when I try to index a unx (Source-BW). The indexing fails within seconds and it gives an error. The main problem is that it doesn’t specifiy if the data is found or if its times out. The error just says the “WavBow01.IndexingServer generated the following message: Index creation failed”

    Funny thing is that it wasnt working yesterday and i restarted the servers and it worked. Today it doesnt work even if I restart the node

    I cant seem to figure out what the root cause is, let alone the resolution. Any Ideas?

    Regards

    (0) 
    1. George Pertea Post author

      Hi Abdul,

      in this scenario, we are using Data Federation Service from Adaptive Processing Server, so you may want to split the APS per KB 1694041.

      I would start by building a Webi report based on same UNX and same object selection and see if Webi gives you more info on the error.

      Also you can enable tracing in CMC for the Indexing server and have a look.

      (0) 
  4. Aggarwal Himanshu

    Hey George

    This is a wondeful article you have written on processing of BO Explorer. Thanks alot for sharing it.

    Could you please also share some information regarding how explorer retrieves data from source database using the index files it created. What kind of queries it does to get the data which is visible in an exploration view. And what mechanism it uses to modify the visualization (charts and tabular data) so quickly in response to the user actions in any exploration view.

    Thanks!!

    (0) 

Leave a Reply