Skip to Content
Technical Articles
Author's profile photo Jens Gleichmann

Best experience: Creation of HANA Indexes part II

last updated: 2023-07-26 08:45 CET

This is the second part of the blog series “Best experience: Creation of HANA Indexes”. The first part can be found here.


  1. Starting with indexes (part I)
  2. Limitations (part I)
  3. SAP recommendations (part I)
  4. Evaluate expensive statements (part I)
  5. Analyzing the system
  6. Performance comparison
  7. Tools
  8. Recommendations

5. Analyzing the system

If you don’t have any performance issues and no complaints from business users stop optimizing a system which don’t needs it. Every index has also side effects which should be evaluated carefully.

For the start check the current primary key and existing indexes of your table.

This can be done via scripts from the SQL collection (note 1969700).

If you only have the table name use the following statement:


( SELECT                                      /* Modification section */
          '%' HOST,
          '%' PORT,
          '%' SERVICE_NAME,
          '%' SCHEMA_NAME,
          '<insert_table_name>' TABLE_NAME,
          '%' DATA_TYPE,
          '%' COMPRESSION_TYPE,
          '%' LOADED,
          '%' PAGEABLE,
          ' ' EXCLUDE_PK_AND_UNIQUE,
          'SIZE' ORDER_BY,                          /* SIZE, TABLE */

Please check the compression type (see part I – sparse / prefixed) for the possible affected index columns.

If you already have the statement hash use (which also includes the CS columns):

HANA_SQL_StatementHash_DataCollector* (depending on your revision)

( SELECT                /* Modification section */
      '1000/10/18 07:58:00' BEGIN_TIME,                  /* YYYY/MM/DD HH24:MI:SS timestamp, C, C-S<seconds>, C-M<minutes>, C-H<hours>, C-D<days>, C-W<weeks>, E-S<seconds>, E-M<minutes>, E-H<hours>, E-D<days>, E-W<weeks>, MIN */
      '9999/10/18 08:05:00' END_TIME,                    /* YYYY/MM/DD HH24:MI:SS timestamp, C, C-S<seconds>, C-M<minutes>, C-H<hours>, C-D<days>, C-W<weeks>, B+S<seconds>, B+M<minutes>, B+H<hours>, B+D<days>, B+W<weeks>, MAX */
      '<insert_hash>' STATEMENT_HASH,
      -1 PLAN_ID,
      86400 TRACE_HISTORY_S,
      200 LINE_LENGTH,
      'H' TIME_UNIT                    /* MS, S, M, H, D */

In our example: Hash 426bdc1658f5f21d5bebf85ec4ab9c2c


======= ======= ======== ===== ==============================
   5033   54.07 hostname 30040 SqlExecutor
   4275   45.92 hostname 30040 JobWorker
SAMPLES PERCENT THREAD_STATE     LOCK_NAME                                                            
======= ======= ================ ======================================================================
   8209   88.19 Running          ?
   1093   11.74 Job Exec Waiting JobBarrier Wait for Jobs
      3    0.03 Job Exec Waiting ?
      2    0.02 Mutex Wait       unnamed Mutex
      1    0.01 Mutex Wait       ?
SAMPLES PERCENT THREAD_TYPE THREAD_METHOD                                    
======= ======= =========== ==================================================
   5033   54.07 SqlExecutor ExecutePrepared
   4043   43.43 JobWorker   RleScanVecOutJob<range>
    230    2.47 JobWorker   <unknown method>
      2    0.02 JobWorker   BarrierEmptyJob
SAMPLES PERCENT THREAD_TYPE THREAD_METHOD           THREAD_DETAIL                                                                  
======= ======= =========== ======================= ================================================================================
   5033   54.07 SqlExecutor ExecutePrepared         SELECT COUNT(*) FROM "VBEP" WHERE "MANDT" = ? AND "VBELN" = ? AND "EDATU" = ?
   4043   43.43 JobWorker   RleScanVecOutJob<range> ?
    230    2.47 JobWorker   <unknown method>        <unknown job>
      2    0.02 JobWorker   BarrierEmptyJob         ParallelDispatcher

To interpret the thread methods you can use SAP note 2114710.

ExecutePrepared stands just for the execution of an already prepared statement which is normal and no indicator for an intervention.

Completely different with RleScanVecOutJob<range>. This thread method is a definitely an indicator for a need of an index.

Another indicator can be check 890 and 1125 of the mini checks (HANA_Configuration_MiniChecks*):

|M0890|Unusual frequent thread methods (last hour)    |        |IndirectScanBvOutJob<BV> (5.33 threads)                      |none          |X| 2114710|
|M1125|Columns with many scanned records              |        |MSEG.BUDAT_MKPF (41067249/s)                                 |none          |X| 2000002|
|M1125|Columns with many scanned records              |        |MSEG.MANDT (16265794/s)                                      |none          |X| 2000002|
|M1125|Columns with many scanned records              |        |MSEG.MATNR (375450570/s)                                     |none          |X| 2000002|
|M1125|Columns with many scanned records              |        |MSEG.MJAHR (354290653/s)                                     |none          |X| 2000002|
|M1125|Columns with many scanned records              |        |MSEG.WERKS (28137626/s)                                      |none          |X| 2000002|


Indication of necessity of an index

Thread methods like

  • IndirectScanBvOutJob*
  • JobParallelMgetSearch
  • JobParallelPagedMgetSearch
  • PrefixedScanVecOutJob
  • PrefixedScanVecOutJob<range>
  • RlePredScanJob<ScanVectorBinSearchPredicate>(out=vector)
  • RlePredScanJob<ScanVectorPredicate>(out=vector)
  • RleScanBvOutJob<BV>
  • RleScanBvOutJob<range>
  • RleScanVecOutJob<BV>
  • RleScanVecOutJob<range>
  • RleScanBvOutJob
  • scanWithoutIndex
  • ClusterIndexScanBvOutJob<ScanRangePredicate>
  • ClusterScanBvOutJob<BV>
  • ClusterScanBvOutJob<range>
  • ClusterScanVecOutJob<range>
  • SparseBvScanBvOutJob
  • SparseBvScanVecOutJob
  • SparsePredScanBvOutJob<ScanRangesPredicate>
  • SparsePredScanVecOutJob<ScanRangesPredicate>
  • SparsePredScanVecOutJob<ScanVectorBinSearchPredicate>
  • SparsePredScanVecOutJob<ScanVectorPredicate>
  • SparseRangeScanBvOutJob
  • SparseRangeScanVecOutJob
  • sparseSearch
  • sse_icc_lib::mgetSearchi_AVX2impl
  • sse_icc_lib::mgetSearchi_AVX


If you have found out the thread method, you have to go for the statement hash:


          ( SELECT                                                      /* Modification section */
              '1000/10/18 07:58:00' BEGIN_TIME,                  /* YYYY/MM/DD HH24:MI:SS timestamp, C, C-S<seconds>, C-M<minutes>, C-H<hours>, C-D<days>, C-W<weeks>, E-S<seconds>, E-M<minutes>, E-H<hours>, E-D<days>, E-W<weeks>, MIN */
              '9999/10/18 08:05:00' END_TIME,                    /* YYYY/MM/DD HH24:MI:SS timestamp, C, C-S<seconds>, C-M<minutes>, C-H<hours>, C-D<days>, C-W<weeks>, B+S<seconds>, B+M<minutes>, B+H<hours>, B+D<days>, B+W<weeks>, MAX */
              'SERVER' TIMEZONE,                              /* SERVER, UTC */
              '%' HOST,
              '%' PORT,
              -1  THREAD_ID,
              '%' THREAD_TYPE,                /* e.g. 'SqlExecutor', 'JobWorker' or 'MergedogMonitor' */
              '%' THREAD_STATE,               /* e.g. 'Running', 'Network Read' or 'Semaphore Wait' */
              'RleScanVecOutJob%' THREAD_METHOD,
              '%' THREAD_DETAIL,
              '%' STATEMENT_HASH,
              '%' ROOT_STATEMENT_HASH,
              '%' STATEMENT_ID,
              '%' STATEMENT_EXECUTION_ID,
              '%' DB_USER,
              '%' APP_NAME,
              '%' APP_USER,
              '%' APP_SOURCE,
              '%' LOCK_TYPE,
              '%' LOCK_NAME,
              -1  NUMA_NODE,
              '%' CLIENT_IP,
              -1  CLIENT_PID,
              -1  CONN_ID,
              -1  MIN_SAMPLES_TOTAL,
              60 TIME_SLICE_S,
              'CURRENT' DATA_SOURCE,
              -1 RESULT_ROWS

=> in this example thread method RleScanVecOutJob (may be you have to adjust the data source to HISTORY)


If you have found out the hash, you can translate it readable to SQL text via


  ( SELECT                         /* Modification section */
      '426bdc1658f5f21d5bebf85ec4ab9c2c' STATEMENT_HASH,
      ' ' REPLACE_BINDS,


? AND "EDATU" = ?


From the ColStats you can see which col is frequently scanned and has no index (=>INDEX_TYPE=NONE):

For more details you have to look into the explain plan and the planviz. (not part of this blog series)

At first have a look at the where clause. The ideal index would be on MANDT, VBELN and EDATU. But you should choose as less columns as possible. For this you can calculate the selectivity as stated out in this blog. Alternatively, you test more variation with the performance comparison (see below).

In most of the systems with only one client you can create a single col index on EDATU (due selectivity). In our case we have more than 20 clients which makes it worth to add the client column MANDT to the index.

Result in a multi columns index on:




Be ware that HANA can’t evaluate ranges in multi column indexes, you can manually create an additional single column index on the parallelization column.



6. Performance comparison

A before/after comparison is possible with the SQL: HANA_SQL_StatementHash_DiffReport

The difference of cursor and execution time is that the cursor time includes the client time.


6.1 Another example incl. thread method changes


======= ======= ============================= ===== ==============================
   2144   68.30 hostname                      30040 JobWorker
    995   31.69 hostname                      30040 SqlExecutor
SAMPLES PERCENT THREAD_STATE                        LOCK_NAME                                                            
======= ======= =================================== ======================================================================
   2618   83.40 Running                             ?
    518   16.50 Job Exec Waiting                    JobBarrier Wait for Jobs
      3    0.09 Job Exec Waiting                    ?
SAMPLES PERCENT THREAD_TYPE          THREAD_METHOD                                    
======= ======= ==================== ==================================================
   1988   63.33 JobWorker            SparseIndexScanVecOutJob
    995   31.69 SqlExecutor          ExecQidItab
     86    2.73 JobWorker            IndirectIndexScanVecOutJob
     65    2.07 JobWorker            RleScanVecOutJob
      5    0.15 JobWorker            
SAMPLES PERCENT THREAD_TYPE          THREAD_METHOD                                      THREAD_DETAIL                                                                  
======= ======= ==================== ================================================== ================================================================================
   1988   63.33 JobWorker            SparseIndexScanVecOutJob       ?
    995   31.69 SqlExecutor          ExecQidItab                                        ?
     86    2.73 JobWorker            IndirectIndexScanVecOutJob     ?
     65    2.07 JobWorker            RleScanVecOutJob                            ?
      5    0.15 JobWorker                                               
63% SparseIndexScanVecOutJob
32% ExecQidItab
3% IndirectIndexScanVecOutJob


Nearly 2/3 of the time were spend on thread methods which are an indicator for a need of an index.


Executions	15100						
Records	474	0,03						
Preparations	2	0					
Elapsed	time	    0,34	h	 82,8	ms	          2637,73	ms
Execution	time	0,34	h	 82,77	ms	          2636,92	ms
Preparation	time	0	    h	  0,02	ms	             0,81	ms
Lock	wait time	0	    h	  0	ms	                 0	    ms
======= ======= ============================= ===== ==============================
     13  100.00 hostname                      30040 SqlExecutor
SAMPLES PERCENT THREAD_STATE                        LOCK_NAME                                                             
======= ======= =================================== ======================================================================
     13  100.00 Running                             ?
SAMPLES PERCENT THREAD_TYPE          THREAD_METHOD                                    
======= ======= ==================== ==================================================
     13  100.00 SqlExecutor          ExecQidItab
SAMPLES PERCENT THREAD_TYPE          THREAD_METHOD                                      THREAD_DETAIL                                                                  
======= ======= ==================== ================================================== ================================================================================
     13  100.00 SqlExecutor          ExecQidItab                                        ?

=> 100% ExecQidtab

=> but still room for improvement, but not for the execution plan => optimize compression


7. Tools


The report SHDB_INDEX_ANALYZE from SAP note 1794297 should be executed as post conversion task.

“SHDB_INDEX_ANALYZE analyzes the data distribution of the largest tables in the system and suggests indexes on those fields which are highly selective and part of secondary indexes defined for traditional data bases. This report analyzes SAP and customer tables and indexes. This report has to run on the system already ported to HANA and should contain the full production data.”


“SHDB_INDEX_CREATE creates secondary indexes based on a suggestion list imported into the select option INDFLD. The naming convention of this list per entry is
-Three characters to identify the SAP or customer namespace (SAP or. CUS)
-table name
-field name
If available this report should run in the corresponding development system. The created indexes will then be transported through the landscape with standard mechanisms.”


SHDB_INDEX_CHECK checks the created Indexes and ensures that all required indexes are created on the HANA data base. Additionally, the corresponding transport request entries are created.


Use the index adviser to find out for which tables and columns indexing would be most valuable. The script is part of a SAP HANA system installation and runs from the command line. It is located in the $DIR_INSTANCE/exe/python_support directory.

=> AFAIK: This index advisor is not available any more in HANA 2.0 even though the documentation is saying something else

7.5 SQL Collection

The best tool to analyze the performance and determine the col combinations for an index are statements in note 1969700. Get familiar with them and combine it with the know-how of the most popular FAQ HANA notes. There is no silver bullet as general recommendation.



If you want to convert inverted hash indexes for exisiting tables which have been created before HANA2 SP3 you can use the report SHDB_CONV_INV_HASH_PK. The report converts the primary key from the tables for which the flag PK_IS_INVHASH is set in the DDIC table DD02L.
Please be aware that during the conversion the table is locked and the table content cannot be modified. The conversion times will depend on the amount of data volume in the table.


8. Recommendations / Tips

8.1 Check current metadata

Check if there is already an index, because of some internal rules:

  1. Implicit single column indexes when creating primary key
  2. Implicit single column indexes on compressed columns

If a column is compressed with a compression type supporting a BLOCK index and the size of the single column index isn’t larger than 10 % of the column size, a single column index is automatically created. You can check this with SQL HANA_Table_ColumnStore_Columns*:


HANA_Table_ColumnStore_Columns: MSEG Table Columns

as well with HANA_Indexes_Columns*:


HANA_Indexes_Columns: BSEG

In this example BELNR of table BSEG is not part of the primary key and has no manually created index, but it has an own inverted value (IV) index.

8.2 Use as less indexes as possible

Please create indexes only if you have a good reason for it. This means the performance boost by a select has to be bigger by factor 4-10 than the costs of a change of the data. This can change over time, but should be considered in your initial tests.

8.3 Multi CS indexes

Avoid multi Column indexes aka concat attributes! The larger the table, the more the disadvantages arise with them. Means if the table is small you won’t recognize a difference in performance on other operations than select. But there will be a break even point when those attributes will kill the performance.

8.4 Performance disadvantages

It is possible that a drop in a multi-column index (including PK) can speed up your partitioning process. For this you have to measure one run with indexes and one without incl. recreation of indexes.

The amount of data changes and at a given point in time this can have influence on the choice of compression which has also an effect on the indexes. This means you should reorg (=optimize compression) you tables from time to time.

8.5 Speed up index creation

indexserver.ini -> [joins] -> single_thread_execution_for_partitioned_tables to 'false'.
indexserver.ini -> [global] -> create_index_optimize_bulksize​
indexserver.ini -> [ddl] -> max_number_of_data_jobs
indexserver.ini -> [ddl] -> merging_unique_check_threshold

You can set the parameter indexserver.ini -> [global] -> create_index_optimize_bulksize (default: 40000000) to a higher value in order to shift resource consumption from CPU to memory. You can set the parameter indexserver.ini -> [global] -> create_index_optimize_bulksize (default: 40000000 for SAP HANA <= 2.0 SPS 06, 0 respectively individual calculation based on memory limits for SAP HANA >= 2.0 SPS 07) to a higher value in order to shift resource consumption from CPU to memory. A value of 0 or higher than the table rows deactivates bulk processing completely and provides optimal performance at the cost of high memory consumption. Be aware that this setting only applies to column loadable indexes, so if page loadable / NSE (SAP Note 2799997) is active for the index, it can’t be used. (Source: SAP note 2000000).

For partitioned tables the parameter max_number_of_data_jobs is interesting. It can be used to balance between performance, CPU and memory. Per default, this parameter is only limited by the general concurrency settings (SAP Note 2222250), resulting in potentially high CPU and memory consumption and good performance. By reducing it you can reduce the number of concurrently processed partitions and as a consequence also the peak memory utilization. For example, max_number_of_data_jobs = 2 makes sure that only two partitions are processed at the same time and so also the CREATE INDEX memory overhead is lower compared to a parallel processing of more than two partitions. At the same time the overall CREATE INDEX runtime can increase.

In case of memory issues (uniqueness check) due to the creation of an index please use parameter merging_unique_check_threshold with a value of 2000000000.


At the end you achieve the best performance if you avoid unnecessary workload. No workload is the best workload 😉

You have to find out the right balance when you have to create an index. Please do not create an index for every statement which is running longer as expected. Only for the most frequent once with really bad performance. Check out the threads methods and their meaning. It is also possible that you run into a known issue like described in note 2756967. Another possibility is also a bad compression on a column due to fragmentation or wrong partitioning.


Thanks to Kuto Baran who inspired me to this blog and provided input from his German session ‘Ein populärer Irrtum über HANA Indizes’. Special greetings out to Martin Frauendorfer for his amazing SQL script collection without it, it would be impossible to get useable details out of the system. Keep up the outstanding work!


Stay healthy,
-Jens (follow me on Twitter for more geeky news @JensGleichmann)


V1.1 Added section 8

Assigned Tags

      You must be Logged on to comment or reply to a post.
      Author's profile photo Lars Breddemann
      Lars Breddemann

      Just as we’re used to by Jens, this and the previous blog post are well structured and researched and point to the relevant resources. Well done, all around.

      What I am missing in the discussion of the topic is the broader context.

      The scripts that Martin has so carefully prepared are all designed to work in the narrow context of SAP systems running on HANA, and even closer, in the context of performance troubleshooting for those systems.

      No doubt that index can have a tremendous impact on performance and will continue to be a necessary tool for whoever operates a database.

      What’s missing in the articles, though, is a thorough understanding of what adding indexes cost and how one can see if the new indexes indeed delivered the expected benefits.

      With “costs” I mean things like increase of update/insert/delete transaction runtimes, increased table  and log data space requirements, increased backup/recovery times, added development and test efforts. Even a simply back-of-the-envelope calculation of how much effort will go into distributing the new index across the system landscape is missing.

      Then there is the question of whether or not the new index is actually used by the queries it should support. Due to HANA’s lacking EXPLAIN PLAN, all one can do is either use PlanViz and search for the index-usage in it (very tedious for non-trivial plans) or check the mentioned thread-comparison again and note that some of the suspicious thread methods are now lacking or consuming relatively less time. That’s not very good at all.

      And more important than all this: this whole activity of finding tuning “candidates”, guessing at what indexes might help, implementing those and testing whether those have been successful in fixing the issue is not connected back to an end-user/business activity that either costs money or generates income.
      As long as that kind of connection cannot be made, creating indexes, checking hints or playing around with DB parameters will just be the pastime of the system maintainers and not an actual productive activity that adds value.

      In my mind one of the big benefits of HANA is that it allows to operate well without having to do too much for quite some time. And when it becomes necessary to add indexes etc. one should consider it as an expenditure/technical debt that has to pay back in business terms.



      Author's profile photo Jens Gleichmann
      Jens Gleichmann
      Blog Post Author

      Hi Lars,

      thanks for taking the time to comment in detail. You’re describing a valid point which I addressed in this blog too briefly => testing and costs. May be that cries out for a part III ?

      The other part was the finding of candidates. For sure normally you go the other way around:

      1. end user other system owner reports a performance bottleneck
      2. you have the time frame and the impact
      3. you can search in detail for the root cause
      4. find, solve and review => this take the most time

      But if you have analyzed your system for candidates before, which does not include you create indexes for it, you can easily match and compare this short time frame (new issue) with your list. So, you are well prepared and not under high pressure in this busy operation time.

      For sure there must be a connection of need of index (accelerate business process / resolve issue). You shouldn’t create an index only because of bad KPI’s. There must be a business need behind.

      “In my mind one of the big benefits of HANA is that it allows to operate well without having to do too much for quite some time.”

      This only applies if you have a system in which no major changes are made in the application coding. If you have many developers who cannot keep their feet still and pushing new code in a complex system, you are again and again frequently optimizing the statements and indexes.



      Author's profile photo Prasad Rao
      Prasad Rao

      Thanks for sharing your post.

      Thanks and Regards