New WebI feature in BI4.2 – Parallel Queries
I work on the SAP Web Intelligence (Webi) Support team. I want to highlight one of the new features that will be introduced for WebI in SAP BusinessObjects Business Intelligence Platform 4.2 (BI4.2) called Parallel Queries. This allows for parallel query execution to be performed when refreshing WebI documents with multiple data providers.
Prior to BI4.2: Each data provider (query) in a Webi report was run sequentially (when the first query finished executing, the second would run, and so on).
Starting in BI4.2: Parallel query execution is enabled by default for WebI documents that have multiple data providers.
- There is no parallel activity at the refresh preparation (i.e. query script generation, resolution of contexts, and prompts)
- Parallel activities occur on the database at the execution and data fetch part of the refresh process.
- Parallel queries will not be run for dependent data providers.
Parallel queries are supported for the following data provider types:
- Data providers that are based on unv and unx universes.
- Data providers based on SAP HANA Direct Access.
- Data providers that are based on Free-hand SQL and Text Files.
Supported Web Intelligence clients:
- WebI REST SDK
- WebI Rich Client
- WebI HTML and Applet interfaces.
The following screen shot shows what happens before BI4.2 and in BI4.2:
Parallel Query Settings at the WebI level:
- The maximum number of concurrent refresh jobs is set in the WebIContainer_ClientDescriptor.xml file for the Webi Rich Client.
- The maximum number of concurrent refresh jobs is set in the WebIContainer_ServerDescriptor.xml file for WebI running on the server (through BI Launchpad)
- For a default installation location on Windows, the location of the above XML files is as follows:
- C:\Program Files (x86)\SAP BusinessObjects\SAP BusinessObjects Enterprise XI 4.0\win32_x86\config
- The property that is set is called WebiParamMaxConcurrentRefreshJobs.
- By default, the Maximum Concurrent Refresh Jobs is set to 64. This can be set to a value that you find works well for your system. If the value is set to 0, then the parallel data refresh is disabled all together.
The following shows the WebiParamMaxConcurrentRefreshJobs parameter in the Descriptor XML file:
The following describes the behavior at the database connection level relating to parallel queries:
- There can be no more than four concurrent queries running on the same relational connection.
- There can be no more than one concurrent query on the same OLAP connection or text file.
- As of BI4.2 SP01, parallel query setting cannot be changed for each database connection. The setting currently applies to every connection.
A summary of the limitations in SAP BI4.2 are as follows:
- It is not possible to run parallel queries on data providers based on “Direct Access to BEx queries” (BICS).
- It is not possible to run parallel queries on data providers based on “Analysis Views”.
- It is not possible to run parallel queries on data providers based on Excel files.
- It is not possible to change parallel query setting for each database connection. I believe that this will be looked at being incorporated into a support pack later than BI4.2 SP01.
Example of the log file analysis of a parallel query example:
- In this example, a Webi report was refreshed in the Webi Rich Client.
- The WebiParamMaxConcurrentRefreshJobs parameter was set to 64 in the WebIContainer_ClientDescriptor.xml file.
- The webi report is refreshed with 4 data providers that all point to the same relational connection.
- Since there are 4 queries that are all pointing to the same connection which is less than or equal to the limit of 4 parallel queries per connection and since the WebiParamMaxConcurrentRefreshJobs parameter is set to a number of 64 and no other queries are running at the same time as my test, the queries should all be run in parallel.
- Webi Rich Client logging was enabled with “sap_trace_level = trace_debug;” when refreshing the report with the 4 queries running in parallel.
- The .glf log file was saved.
Log File Analysis:
To analyze the log files, I used the FlexiLogReader. This log reader can be used for analysis of BI .glf log files. The FlexiLogReader can be retrieved from the following link:
1. Open the FlexiLogReader by launching the FlexiLogReader64.exe file.
2. Select the Webi Rich Client .glf log file that was created when logging was turned on when refreshing the report with 4 queries running in parallel.
3. Specify the default configuration (PLUGIN: BI 4.x GLF Log Parsing and End to End Analysis (v1.2.0).
4. Specify to load all files that are in the directory of the file that you just selected. In this case, we only have one .glf log file, but if there were more, by selecting this option, all of the files in the current directory would be chosen.
5. When the log files are loaded, notice that one of the ways that the log files are displayed is by “ServerName/PID/TID”. This means that the logs can be broken out by “Server Name = WebRichClient”, Process ID (In this analysis there is only one process ID), and Thread ID (there are many threads). The advantage of this for analyzing refreshes of parallel queries is that there will be one thread that “manages” the creation of all of the refreshes, and a further thread will be spawned for each of the individual refreshes.
6. To find the lines of code that are related to the parallel refresh process, find all of the lines containing the tag [//R]. To do this in the FlexiLogReader, select “Search” – “Search Window” to bring up the search window.
The search term should be added as a tag so that we can filter the logs to only see the entries that include the lines that have the [//R] entry.
Now ensure that you only check the Search Terms [//R] tag and don’t check any of the other ones. After that, select “Show Only Log Entries Linked To Selected Tags”.
Now, show an extra column that refers to the “Thread ID”.
First off, find out what thread ID is associated with the first row that was returned in the [//R] search.
From the above, we can see that the thread ID that is associated with that first row is 9028.
Now, add the “Thread ID” column to the columns that are shown in the trace summary. To do that, select the first row, select “View”, and select “Choose Columns To Show”.
In the “Define Visible Columns” screen, check the “9028” column which stands for the thread ID column and then select “OK”.
The result is that we now have “Column 3” showing which is the “Timestamp”, “Column 12” showing which is the “Thread ID”, and “Column 36” showing which is the “Text”.
7. By looking at the logs that are filtered by [//R], we can see that there are the following 5 different threads:
9028 – This is the thread ID of the “main” thread that manages the refreshes of the 4 different data providers that are refreshing in parallel.
11212 – This is the thread for the refresh of the first data provider (DP0).
11208 – This is the thread for the refresh of the second data provider (DP1).
11776 – This is the thread for the refresh of the third data provider (DP2).
12028 – This is the thread for the refresh of the fourth data provider (DP3).
8. In the main thread (thread ID = 9028 in this case), the following shows the start and finish of each of the queries that are being run in parallel as separate threads.
9. In FlexiLogReader, we can take the thread ID of one of the threads for one of the query refreshes and filter based on that thread ID. So, for example, if we take the thread ID of the first query that is refreshed (thread ID = 11212) and filter based on that thread ID, we see the following (as shown in the below screen shot). We will leave the [//R] search term selected so that in the summary screen, we will see the logs for thread ID = 11212 as well as the logs for the [//R] search term. By filtering on the thread ID, you can see all of the activities that went on during the refresh process. You can see when the thread starts, which query it is referring to, and when the thread end (when the refresh is finished for that query).
You can analyze the other query refreshes in the other threads using the same process as shown above.
I hope that this blog has highlighted the Parallel Queries feature that will be available in BI4.2 including the data providers that it is supported on, how to configure it, how it should lead to quicker refresh times, and how the log files can be examined effectively using the FlexiLogReader. It looks like this will be a great feature!