When is a large document too large? Some rules of thumb for sizing Web Intelligence.
In a typical scenario, Web Intelligence is used by report authors who want to deliver interactive content to consumers. Sometimes authors have pressure to create extremely large WebI documents that ideally will provide answers to any possible question a consumer might have. One WebI customer is delivering documents spanning nearly 200,000 pages. It works, but the delivery is pretty delicately controlled. In general, however, this tendency to build documents with huge data volumes is the same affliction that causes someone to buy a house with 2 extra bedrooms on the chance that someday somehow both sets of grandparents will come calling the same weekend. It might just happen. Whew! What peace of mind to have shelved out an extra $150k for two rooms you might simultaneously fill once a decade!
In the WebI world, the desire to build reports with large data volumes pushes inevitably the limits that the Webi server (for dHTML or Java clients) or Webi Rich Client can handle. WebI memory use, which of course increases as the document size increases, is limited by the processor. As of XI 3.1, at 32 bits, the processor limit for Webi doc size is around 2 gigabytes. This memory is consumed by the storage of the DP results, calculations and the report engine.
In practice, from the perspective of data in the data provider(s) results, WebI can support until a maximum of around 500 megabytes, being roughly 25 million values (e.g. 5 cols * 5 million rows, 25 columns by 1 million rows, etc.). In this case, the document will consume all memory available to the process. For the Rich Client, the content can be consumed offline on each user’s machine, so this memory is not shared. For online clients, the process must be shared by each concurrent client on a given server, so divide the document size limit by number of concurrent users (e.g. 10 concurrent users on a doc of 2.5 million values could max out the server).
Again, it’s important to note that these are rules of thumbs, not absolutes. You might find the server performing adequately even with such gigantic documents. However, the size of a WebI document in terms of rows/cells is not the only variable in play. Synchronization of dimensions between multiple data sources and the number of variables also has an impact, as does the complexity of the layout. So, a 10 million value document with multiple sources and lots of complex variables and calculations and a lot of charts and tables and conditional formatting might put pressure on the server as much as a table-dump with 25 million values.
But before you start categorizing WebI as a client that is best for small documents, let’s step back and think about what a document with 25 million values means to report authoring and interactive consumption. First, it’s absurdly large. Just for reference, a 5 million word MS Word document could easily be more than 10,000 pages. Second, it’s absurdly large. Take the example of a query that retrieves 500,000 rows and 50 columns – 25 million cell values. Among those columns, you might have Region, Country, City, Year, Quarter, Month, Product Level 1… Product Level n, Industry Code, Customer Name, Address…. And then of course there are measures like Sales, Units sold, various Counts, etc. Maybe these columns/rows are fed by another source or two – a customer support site plus maybe even an Excel file with forecast data. This report is great! It contains the sandbox for answering any question at any levels. Just which of the dozen tabs should I click on and how long should I scroll to get my answer? Third, it’s absurdly large. Maintaining a document this large – with all of its different analytical views and web of interdependent variables and calculations – is always going to be painful. You better hope the author never leaves the organization, because untangling the intentions and interdependencies within such a large document will be next to impossible.
How will users, concretely, consume this volume of data? A handful of aggregation in a table with options to drill to 4 or 5 levels of details at any class of dimensions – Geography, Time, Product Levels, Customers, etc.? In this case, since the author is not adding value to the content – either through formatting, calculations and variables, juxtaposition of content to tell a story or synchronization of data between different sources, etc. – this is a clear case where Explorer would bring value to this scenario. If the expectation is for consumers to explore the data “as is”, then give them Explorer to let them find their factoid answers at any detail of data.
Often in these scenarios, the author does try to add value by defining a multi-tab WebI document with dozens of different detailed views of the data organized in different themes. These documents take weeks to create and validate and could take days to go through, but in theory they enable users to have any of their possible questions answered, no matter how detailed and unlikely they may be. For the vast majority of users, putting content detailed as this into one document is like buying flood insurance if you live in the Sahara. Yes, it is possible that someone, one day, might want access to those details in one session. And over years, a user might access quite a few details related to numerous dimensions. However, is it worth the price of performance to buy such insurance with a document that size? Instead of building such a monstrosity as one document, consider alternatives for delivering the same content in a couple, linked documents, using features such as query drill or prompts (optional prompts might help), using queries to pull specific levels of aggregation instead of in one giant chunk, allowing more ad-hoc query and document creation with compulsory prompts turned on, etc.
Rest assured, however, that help is on the way for customers insisting on such humongous documents. In an upcoming release, we plan on making WebI 64 bit, effectively removing the process limit and enabling the addition of more physical memory on the server to improve the handling of larger reports/throughput. The Rich Client on an end-user’s machine, which uses a “local server” implementation, will also become 64 bit in a future release. (Note that the Rich Client has essentially repatriated some Webi server components to be part of the Rich Client installation. This is what enables WebI content to be taken offline from BOE.)
But in the meantime, authors should constantly check the trade-offs involved with building extremely large documents. And ultimately, it comes down to understanding what end users (consumers) really want to and need to do with the content. Next time you get pressure to create a 12-tab, 50 dimension, one-size-fits-all report, push back a little. The consumers are not always right. Ask how many 10,000 page word documents they use.
It is the job of the computer to go over millions of records and "intelligently" present me with the summary so that intelligent decisions can be made.
64-bit will bring numerous benefits, but the pressure for authors to create large documents is always there. The source data -- either quickly retrieved and/or retrieved in volume -- is only part of what users need to answer their questions. Authors add value to that data by adding calcuations, laying out, juxtaposing analytics, chunking the data in different ways (e.g. with groups/sections, breaks, embedding groups within groups), synchronizing data from 1 source with data from other sources, adding conditional formatting, etc. Authors also know that a summary/details for one user is not applicable to another, and that summary/details are variable by the users' contexts (e.g. end of quarter versus mid-quarter). It is the recognition of all the potential answers hidden in a big data dump that push authors to create bloated reports.
The challenge of "intelligently" presenting relevant and automatically summarized data and its detail based on a query is significant. Use the "I'm feeling lucky" button in Google and see how their assumptions about the context of my keywords are wrong more than they're right.
Again, great blog on a topic not talked about enough.
We would look forward for 64-Bit Rich client(We are not able avoid very large sized documents[;-)] ).
Thanks
I truly appreciate the fact that Michael has explained the real issue with creating large documents. While technically it may be possible to have really large result sets and provide "ALL" the possible answeres in one document, it makes for a very poor user experience.
The approach of retrieving large result sets, with numerous dimensional levels, to cover the off-chance of someone requiring that detail level is flat out poor design and causes the larger usage and analysis pattern of the report to suffer.
It certainly is not Business Intelligence to have a person wade thru millions of records to discern what they should focus on. A report should be constructed that provides enough business context to determine what should be investigated further.
This typically means bringing a smaller aggregated set of data, with perhaps a few levels of scope (this is dependant on how quickly the number of rows expands per level of dimensionality.)
Once the user is able to discern what needs to be investigated further then the techniques that Michael mentioned can be employed (dynamically drilling down the hierarchy (new query) passing context to another report to get the detailed data, etc. Note: this approach effectively narrows down the result set dramatically as the detailed level data should have enough context to restrict the rowset accordingly.
Dorien Gardner
Principal Deployment Specialist, OEM
SAP Business Objects
In one of the project, we have even pushed the data feed like reports with millions of records to be delivered using ETL as there's no further interaction done on those data feeds.
If webi is used in the right context and purpose, it can really lead to make intelligent decisions but as consultants I guess we have an important task to emphasize on the usage of webi.
For me, if you can't print the report, it's not a good report.