Skip to Content
Author's profile photo Former Member

When is a large document too large? Some rules of thumb for sizing Web Intelligence.

In a typical scenario, Web Intelligence is used by report authors who want to deliver interactive content to consumers. Sometimes authors have pressure to create extremely large WebI documents that ideally will provide answers to any possible question a consumer might have. One WebI customer is delivering documents spanning nearly 200,000 pages. It works, but the delivery is pretty delicately controlled. In general, however, this tendency to build documents with huge data volumes is the same affliction that causes someone to buy a house with 2 extra bedrooms on the chance that someday somehow both sets of grandparents will come calling the same weekend. It might just happen. Whew! What peace of mind to have shelved out an extra $150k for two rooms you might simultaneously fill once a decade!

In the WebI world, the desire to build reports with large data volumes pushes inevitably the limits that the Webi server (for dHTML or Java clients) or Webi Rich Client can handle. WebI memory use, which of course increases as the document size increases, is limited by the processor. As of XI 3.1, at 32 bits, the processor limit for Webi doc size is around 2 gigabytes. This memory is consumed by the storage of the DP results, calculations and the report engine.

In practice, from the perspective of data in the data provider(s) results, WebI can support until a maximum of around 500 megabytes, being roughly 25 million values (e.g. 5 cols * 5 million rows, 25 columns by 1 million rows, etc.). In this case, the document will consume all memory available to the process. For the Rich Client, the content can be consumed offline on each user’s machine, so this memory is not shared. For online clients, the process must be shared by each concurrent client on a given server, so divide the document size limit by number of concurrent users (e.g. 10 concurrent users on a doc of 2.5 million values could max out the server).

Again, it’s important to note that these are rules of thumbs, not absolutes. You might find the server performing adequately even with such gigantic documents. However, the size of a WebI document in terms of rows/cells is not the only variable in play. Synchronization of dimensions between multiple data sources and the number of variables also has an impact, as does the complexity of the layout. So, a 10 million value document with multiple sources and lots of complex variables and calculations and a lot of charts and tables and conditional formatting might put pressure on the server as much as a table-dump with 25 million values.

But before you start categorizing WebI as a client that is best for small documents, let’s step back and think about what a document with 25 million values means to report authoring and interactive consumption. First, it’s absurdly large. Just for reference, a 5 million word MS Word document could easily be more than 10,000 pages. Second, it’s absurdly large. Take the example of a query that retrieves 500,000 rows and 50 columns – 25 million cell values. Among those columns, you might have Region, Country, City, Year, Quarter, Month, Product Level 1… Product Level n, Industry Code, Customer Name, Address…. And then of course there are measures like Sales, Units sold, various Counts, etc. Maybe these columns/rows are fed by another source or two – a customer support site plus maybe even an Excel file with forecast data. This report is great! It contains the sandbox for answering any question at any levels. Just which of the dozen tabs should I click on and how long should I scroll to get my answer? Third, it’s absurdly large. Maintaining a document this large – with all of its different analytical views and web of interdependent variables and calculations – is always going to be painful. You better hope the author never leaves the organization, because untangling the intentions and interdependencies within such a large document will be next to impossible.

How will users, concretely, consume this volume of data? A handful of aggregation in a table with options to drill to 4 or 5 levels of details at any class of dimensions – Geography, Time, Product Levels, Customers, etc.? In this case, since the author is not adding value to the content – either through formatting, calculations and variables, juxtaposition of content to tell a story or synchronization of data between different sources, etc. – this is a clear case where Explorer would bring value to this scenario. If the expectation is for consumers to explore the data “as is”, then give them Explorer to let them find their factoid answers at any detail of data.

Often in these scenarios, the author does try to add value by defining a multi-tab WebI document with dozens of different detailed views of the data organized in different themes. These documents take weeks to create and validate and could take days to go through, but in theory they enable users to have any of their possible questions answered, no matter how detailed and unlikely they may be. For the vast majority of users, putting content detailed as this into one document is like buying flood insurance if you live in the Sahara. Yes, it is possible that someone, one day, might want access to those details in one session. And over years, a user might access quite a few details related to numerous dimensions. However, is it worth the price of performance to buy such insurance with a document that size? Instead of building such a monstrosity as one document, consider alternatives for delivering the same content in a couple, linked documents, using features such as query drill or prompts (optional prompts might help), using queries to pull specific levels of aggregation instead of in one giant chunk, allowing more ad-hoc query and document creation with compulsory prompts turned on, etc.

Rest assured, however, that help is on the way for customers insisting on such humongous documents. In an upcoming release, we plan on making WebI 64 bit, effectively removing the process limit and enabling the addition of more physical memory on the server to improve the handling of larger reports/throughput. The Rich Client on an end-user’s machine, which uses a “local server” implementation, will also become 64 bit in a future release. (Note that the Rich Client has essentially repatriated some Webi server components to be part of the Rich Client installation. This is what enables WebI content to be taken offline from BOE.)

But in the meantime, authors should constantly check the trade-offs involved with building extremely large documents. And ultimately, it comes down to understanding what end users (consumers) really want to and need to do with the content. Next time you get pressure to create a 12-tab, 50 dimension, one-size-fits-all report, push back a little. The consumers are not always right. Ask how many 10,000 page word documents they use.

Assigned Tags

      9 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Edward Pelyavskyy
      Edward Pelyavskyy
      I thought the intelligence is to use 64 bit processors to work with a large number of records faster not to create large documents faster.

      It is the job of the computer to go over millions of records and "intelligently" present me with the summary so that intelligent decisions can be made.

      Author's profile photo Former Member
      Former Member
      Edward, Thanks for the comment. First, let's make sure we're clear about what happens when WebI retrieves data. It's not an OLAP tool, ie. it's not "always online" (like BEx Web Analyzer, where a user's navigation action pulls the necessary data from the source to display the details the user wants). WebI's query step, which defines the data that authors or ad-hoc business users want to answer their questions, pulls the data from the source when clicking the "run query" button. As a result, the data plus the metadata that ties it together populates WebI's microcube. From there, users don't need to be connected to source data and can even analyze, report and otherwise interact with content while in a train. They can also combine the data from one source with other sources - which is a capability that the vast majority of BusinessObjects customers rely on (ie. all data for answering my questions is not always in BW).

      64-bit will bring numerous benefits, but the pressure for authors to create large documents is always there. The source data -- either quickly retrieved and/or retrieved in volume -- is only part of what users need to answer their questions. Authors add value to that data by adding calcuations, laying out, juxtaposing analytics, chunking the data in different ways (e.g. with groups/sections, breaks, embedding groups within groups), synchronizing data from 1 source with data from other sources, adding conditional formatting, etc. Authors also know that a summary/details for one user is not applicable to another, and that summary/details are variable by the users' contexts (e.g. end of quarter versus mid-quarter). It is the recognition of all the potential answers hidden in a big data dump that push authors to create bloated reports.

      The challenge of "intelligently" presenting relevant and automatically summarized data and its detail based on a query is significant. Use the "I'm feeling lucky" button in Google and see how their assumptions about the context of my keywords are wrong more than they're right.

      Author's profile photo Former Member
      Former Member
      Excellent blog Michael.  This a topic that for some reason is rarely talked about in a sensible manner.  It seems so many companies (and BI organizations) get in the mentality of "the customer is always right" when it comes to BI.  This tends to lead to the absurd report requirements you noted, and the BI team just does what is asked rather than questioning the validity. I think this sentence pretty much sums it up.. "And ultimately, it comes down to understanding what end users (consumers) really want to and need to do with the content."  >> I would emphasize the "need to" part of that comment.  I would always recommend questioning and asking WHY before simply executing or saying YES.  I also think that we get caught up in the idea and concept of self service BI... I completely agree this is the direction we should go. However, I think more emphasis needs to be put on educating knowledge workers on how to intelligently build reports, data visualization best practices, and other related topics which would make their life much easier and more effective.  Just my 2 cents. 

      Again, great blog on a topic not talked about enough. 

      Author's profile photo Vamsi Krishna
      Vamsi Krishna
      Excellent post Michael,keep posting similar ones with more insight into the product.
      Author's profile photo Former Member
      Former Member
      Amazing article to why to avoid really large document.
      We would look forward for 64-Bit Rich client(We are not able avoid very large sized documents[;-)] ).

      Thanks

      Author's profile photo Former Member
      Former Member
      we, too, are unable to avoid large volume and thus large document.  The business we run generates a mind-boggling amount of data and transactions. 
      Author's profile photo Former Member
      Former Member
      Thanks Michael this was an excellent article!

      I truly appreciate the fact that Michael has explained the real issue with creating large documents. While technically it may be possible to have really large result sets and provide "ALL" the possible answeres in one document, it makes for a very poor user experience.

      The approach of retrieving large result sets, with numerous dimensional levels, to cover the off-chance of someone requiring that detail level is flat out poor design and causes the larger usage and analysis pattern of the report to suffer.

      It certainly is not Business Intelligence to have a person wade thru millions of records to discern what they should focus on. A report should be constructed that provides enough business context to determine what should be investigated further.

      This typically means bringing a smaller aggregated set of data, with perhaps a few levels of scope (this is dependant on how quickly the number of rows expands per level of dimensionality.)

      Once the user is able to discern what needs to be investigated further then the techniques that Michael mentioned can be employed (dynamically drilling down the hierarchy (new query) passing context to another report to get the detailed data, etc. Note: this approach effectively narrows down the result set dramatically as the detailed level data should have enough context to restrict the rowset accordingly.

      Dorien Gardner
      Principal Deployment Specialist, OEM
      SAP Business Objects

      Author's profile photo Former Member
      Former Member
      Thanks Michael! We try and set the right expectations with the users on the usage of webintelligence.

      In one of the project, we have even pushed the data feed like reports with millions of records to be delivered using ETL as there's no further interaction done on those data feeds.

      If webi is used in the right context and purpose, it can really lead to make intelligent decisions but as consultants I guess we have an important task to emphasize on the usage of webi.

      Author's profile photo Former Member
      Former Member
      I'm an BOXI-administrator and we recommend to our designers to limit the number of rows in the universes to 50.000 rows.  But there are always exceptions.  In order to convince the end-users to live with this limitation, I often ask them: "What are you going to do with your large reports?  Do you want to print these reports?" 
      For me, if you can't print the report, it's not a good report.