When is a large document too large? Some rules of thumb for sizing Web Intelligence.
In a typical scenario, Web Intelligence is used by report authors who want to deliver interactive content to consumers. Sometimes authors have pressure to create extremely large WebI documents that ideally will provide answers to any possible question a consumer might have. One WebI customer is delivering documents spanning nearly 200,000 pages. It works, but the delivery is pretty delicately controlled. In general, however, this tendency to build documents with huge data volumes is the same affliction that causes someone to buy a house with 2 extra bedrooms on the chance that someday somehow both sets of grandparents will come calling the same weekend. It might just happen. Whew! What peace of mind to have shelved out an extra $150k for two rooms you might simultaneously fill once a decade!
In the WebI world, the desire to build reports with large data volumes pushes inevitably the limits that the Webi server (for dHTML or Java clients) or Webi Rich Client can handle. WebI memory use, which of course increases as the document size increases, is limited by the processor. As of XI 3.1, at 32 bits, the processor limit for Webi doc size is around 2 gigabytes. This memory is consumed by the storage of the DP results, calculations and the report engine.
In practice, from the perspective of data in the data provider(s) results, WebI can support until a maximum of around 500 megabytes, being roughly 25 million values (e.g. 5 cols * 5 million rows, 25 columns by 1 million rows, etc.). In this case, the document will consume all memory available to the process. For the Rich Client, the content can be consumed offline on each user’s machine, so this memory is not shared. For online clients, the process must be shared by each concurrent client on a given server, so divide the document size limit by number of concurrent users (e.g. 10 concurrent users on a doc of 2.5 million values could max out the server).
Again, it’s important to note that these are rules of thumbs, not absolutes. You might find the server performing adequately even with such gigantic documents. However, the size of a WebI document in terms of rows/cells is not the only variable in play. Synchronization of dimensions between multiple data sources and the number of variables also has an impact, as does the complexity of the layout. So, a 10 million value document with multiple sources and lots of complex variables and calculations and a lot of charts and tables and conditional formatting might put pressure on the server as much as a table-dump with 25 million values.
But before you start categorizing WebI as a client that is best for small documents, let’s step back and think about what a document with 25 million values means to report authoring and interactive consumption. First, it’s absurdly large. Just for reference, a 5 million word MS Word document could easily be more than 10,000 pages. Second, it’s absurdly large. Take the example of a query that retrieves 500,000 rows and 50 columns – 25 million cell values. Among those columns, you might have Region, Country, City, Year, Quarter, Month, Product Level 1… Product Level n, Industry Code, Customer Name, Address…. And then of course there are measures like Sales, Units sold, various Counts, etc. Maybe these columns/rows are fed by another source or two – a customer support site plus maybe even an Excel file with forecast data. This report is great! It contains the sandbox for answering any question at any levels. Just which of the dozen tabs should I click on and how long should I scroll to get my answer? Third, it’s absurdly large. Maintaining a document this large – with all of its different analytical views and web of interdependent variables and calculations – is always going to be painful. You better hope the author never leaves the organization, because untangling the intentions and interdependencies within such a large document will be next to impossible.
How will users, concretely, consume this volume of data? A handful of aggregation in a table with options to drill to 4 or 5 levels of details at any class of dimensions – Geography, Time, Product Levels, Customers, etc.? In this case, since the author is not adding value to the content – either through formatting, calculations and variables, juxtaposition of content to tell a story or synchronization of data between different sources, etc. – this is a clear case where Explorer would bring value to this scenario. If the expectation is for consumers to explore the data “as is”, then give them Explorer to let them find their factoid answers at any detail of data.
Often in these scenarios, the author does try to add value by defining a multi-tab WebI document with dozens of different detailed views of the data organized in different themes. These documents take weeks to create and validate and could take days to go through, but in theory they enable users to have any of their possible questions answered, no matter how detailed and unlikely they may be. For the vast majority of users, putting content detailed as this into one document is like buying flood insurance if you live in the Sahara. Yes, it is possible that someone, one day, might want access to those details in one session. And over years, a user might access quite a few details related to numerous dimensions. However, is it worth the price of performance to buy such insurance with a document that size? Instead of building such a monstrosity as one document, consider alternatives for delivering the same content in a couple, linked documents, using features such as query drill or prompts (optional prompts might help), using queries to pull specific levels of aggregation instead of in one giant chunk, allowing more ad-hoc query and document creation with compulsory prompts turned on, etc.
Rest assured, however, that help is on the way for customers insisting on such humongous documents. In an upcoming release, we plan on making WebI 64 bit, effectively removing the process limit and enabling the addition of more physical memory on the server to improve the handling of larger reports/throughput. The Rich Client on an end-user’s machine, which uses a “local server” implementation, will also become 64 bit in a future release. (Note that the Rich Client has essentially repatriated some Webi server components to be part of the Rich Client installation. This is what enables WebI content to be taken offline from BOE.)
But in the meantime, authors should constantly check the trade-offs involved with building extremely large documents. And ultimately, it comes down to understanding what end users (consumers) really want to and need to do with the content. Next time you get pressure to create a 12-tab, 50 dimension, one-size-fits-all report, push back a little. The consumers are not always right. Ask how many 10,000 page word documents they use.