Design Studio Innovation Series – Topic 1: Making a Case for a Larger SDK Result Set
UPDATE 12 May 2015: As of Design Studio 1.5, the SDK result set cell limit will be configurable beyond 10K.
Introduction
Welcome to the inaugural edition of the Design Studio Innovation Blog Series.
There have recently been a number of discussions regarding result set size limits with Design Studio, as related to how much data can be supplied to a single application for a particular data source. I thought it would be worthwhile to investigate the result set limit for SDK components in this context, discuss the implications of this limit and cases where it would be beneficial to have a larger result set than currently supported.
Summary of Findings
Before going into the details I’ll get straight to the point of what I discovered. Although the result set limit for an SDK component is 10,000 data cells , I was able to demonstrate (through a not directly documented feature which I’ll describe later), that in my particular test, over 2.5 million cells could be handled by an SDK component as long as it is designed to do so. For testing purposes the UI5 Table control was used as the basis for an SDK component.
To put this subject into context, lets start with the data source limitations. There is currently an overall restriction on data returned to a Design Studio application based on the data source itself. So for BusinessObjects Universes, this is 50,000 data cells and for BW BEx Queries there’s a safety net of 500,000 data cells, as noted in the following discussions (I wasn’t able to confirm if there is a limit for HANA):
Result set is too large; data retrieval restricted by configuration Universe Connection
Components not showing large data
In comparison, as mentioned previously, the Design Studio SDK framework only allows a result set of 10,000 data cells to be passed to a custom component, even if the connected data source itself has a higher limit.
An Example Use Case for a Large Result Set
When the topic of large result sets comes up, the usual question is “Why do you need to display such a large amount of data on a dashboard?”. It’s not necessarily that there’s a desire to display all of the data at once in a single crosstab for instance but if you’re following the approach of a single data source for an application (a recommended practice) to feed multiple components in multiple tabs, as well as providing the flexibility to filter and swap dimensions and measures, then you can easily need a large result set. A large result set would also allow for local filtering, resulting in more interactive dashboards like we are used to from Xcelsius. Further examples are given in the above discussions.
Now, in the case of SDK components, I’d like to work through one use case example where the 10,000 data cell limit is simply too low. A common requirement is to display a scorecard table with microcharts. So I’ve mocked one up in Excel with actual airline on-time performance data. Here, we simply want to represent on-time performance KPI trends over a single month by day, for each origin state. This represents 53 rows by 12 columns, resulting in 636 “cells” as shown below. Perfectly reasonable for a dashboard I think.
However, the actual number of data cells required to generate this scorecard is 17,808 as shown in the Initial View below, calculated as 53 Origin State rows x 12 Measure (KPI) columns x 28 days per KPI:
So here, I think we have a legitimate example in support of why we would need more than 10,000 data cells for an SDK component. The number of “cells” displayed in the scorecard is certainly not very high, at 636. However, the underlying data required to generate it exceeds the 10,000 cell limit.
Putting It To The Test
So now, just to make a point, let’s see how much data an SDK component can handle if it could be fed a large result set. The limits on the data source as well as the SDK component are based on the number of data cells. If however, for test purposes we define an Initial View on a BW data source without specifying any measures (technically resulting in zero data cells), it turns out that not only is the SDK 10,000 cell limit ignored but so is the 500,000 cell limit on a BW data source. As a result I was able to load an SDK table component with over 2.5 million cells of data as shown below (again using real airline on-time performance data).
Here’s the Initial View of the data source:
Here’s the first page of the loaded SDK component:
And here’s the last page:
With my test data that’s 445,822 rows x 6 columns, representing 2,674,932 cells. Admittedly, it took a while to load from the back-end because I’m running my BW dev system on a medium spec VM and there’s also the client-side component load time but I think on a production system or with HANA in particular the performance would have been significantly better.
Once the data was loaded however, the table component responded very well. Navigating back and forth through the pages was instantaneous while sorting and filtering columns was quite fast too, especially considering the data volume.
A Comment About The No Data Cells Scenario
Conclusion
Design Studio Idea Place: Remove 10,000 cell result set limit for SDK Components
Blog Series Index: Design Studio Innovation Series – Welcome
Hi Mustafa,
just 2 quick comments:
- the fact that you were able to pull in more data than 10.000 cells without a measure is not a surprise because the measures are what is being counted as data cells. so no surprise there.
- the 500.000 data cells for BW that you mentioned is actually just a "default" value for the safety belt and you can change it to anything your would like it to be. its a simple setting - not a hard coded limit that can not be changed.
regards
Ingo Hilgefort, Visual BI
Hi Ingo,
Thanks for the feedback. My comments as follows:
1) Yes, this is what I was trying to point out, that measures count as data cells and excluding them was just an approach for testing beyond the 10,000 cell limit. In fact, even dimensions without measures also count as data cells if they are placed on both axes;
2) Based on Karol Kalisz my understanding is that Design Studio ignores the BW custom setting so the 500,000 data cell limit is always applied, even if you change the default in BW.
byRegards,
Mustafa.
Hey Mustafa,
Great discussion post!
Off-topic from the point of your post, but the 500k BW/BICS safety belt parameter is a maintainable parameter in RSADMIN yet the UNX 'safety belt' is unmaintainable. My frustration with the UNX limit is the fact that I can load an Excel Pivot Table with 10 times the volume and it makes short work of it confuses me why we cannot increase UNX limits.
But I don't think data source throttling is the point of your discussion post.
To I think think the actual point of your post, yes I 100% agree that it should be increased because:
1) Your data-dense microchart example.
2) Reduce server round trips by pulling back a reasonable amount of data for faster interactions.
3) The obvious fact that this is not a strain on the Design Studio runtime that you've proven, regardless of the fact that measures were not pulled in for sake of circumventing the cells check. JSON is JSON, and the JSON for 2.5 million tuples proves (to me) that SDK can handle the JavaScript load of more than 10k.
Now, 2.5 million tuples of JSON is probably too much to ask for and would probably cripple or kill weak workstation browser's JavaScript engine, but I'm sure there's a happy medium in the 100k range (give or take).
Hi Mike,
I think you've summarized my intent very well. As for the 2.5 million tuples, while the UI5 Table based SDK component handled this volume very well, it did indeed cripple the browser when run against the JSON Grabber component, reiterating the point that the component must be designed to handle the volume.
I agree that the 100k range (preferably more give than take 😉 ) would be a happy medium. Hopefully the suggestion is considered in due course.
Mustafa.
Hi Mustafa,
the microchart example is perhaps not a good example because you are only showing as you mentioned 53 x 12 "cells" but you are showing daily data for each cell.
Do I agree that 10.000 cells will not always be enough - yes.
Do I agree with something like 2 million cells for a dashboard - not really.
because then we are going to end up in the situation like Web Intelligence and clearly that does not work for BW and HANA for several reasons and the "data layer" in between being one of the major reasons for all the issues.
regards
Ingo
Hi, nice blog. The "workaround" for data cells is good, I've already try it. Problem is that, if I remember correctly, when building a custom component with databound set to true, is there a way (I guess yes but I never dug up so far) to recreated "lines" (there is none in this case) without tuples, based on other properties ?
Thanks Vincent. To clarify your question, can you provide an example of what you mean by "recreating lines without tuples based on other properties" ?
I was talking about creating plain JavaScript object without any hierarchy information from Data Source (it was not clear I agree !). At the moment for example, I've piece of code I've make based on tuples and number of members inside the Column Dimension that contains Measures, in order to create such objects. Without cells, tuples property of Data Source Metadata (I guess this is how it is supposed to be called) would be empty. This is why I was "asking" for other MetaData properties to achieve this.
Hi Vincent,
The Meta-Data Run-Time JSON never contains tuples information but yes, the Data Run-Time JSON in this case where there are no columns would display tuples as "tuples": [], so your data transformation logic would need to cater for this scenario. I don't think there are other Meta-Data properties to achieve this. If your code is based on looping through tuples and number of members inside the Column Dimension then I would expect that it should simply return no data for the columns when the tuples attribute is empty.
Regards,
Mustafa.
Yes, sorry about my confusion about MetaData and Data Runtime, whatever 🙂 .
Hi!
could you share the code of your ui5-Table test case?
Thank you sincerely!
Daniel
Hi Daniel,
The UI5 Table test is related to a commercial project so unfortunately I am not at liberty to share the code. However, if you have any specific questions I can try to answer them.
Regards,
Mustafa.
Hi Mustafa,
thank you for lobbying for larger result sets, I couldn't agree more. Considering that
are two common scenarios, 10000 cells really is a very low limit.
I vote yes 😉
Cheers,
Christoph
Hi Christoph,
Thanks for your support 🙂 . Hopefully this issue will get the needed visibility and attention to be considered for resolution in a future patch or SP of Design Studio.
As I've indicated, data sources already have a result set limit based on the platform, so it doesn't make any sense to me to reduce the SDK result set even further to a subset of what is returned already by the data source. I'd much prefer to see the SDK result set limit of 10,000 cells removed altogether such that whatever result set is supplied by the data source is passed on directly to the assigned SDK component. The SDK component can then decide how to handle the result set size accordingly. This approach would provide much more flexibility in SDK component use cases.
I imagine the 10,000 cell SDK result set limit is currently hard-coded in Design Studio, so would hope that in theory at least, technically it should be straight forward enough to remove the limit.
Regards,
Mustafa.
Hi Mustafa,
great example. I ran too in some instances where an SDK had to show a lot of line graphs and was hitting the max. I agree with your lobby for larger set.
Must say found your workaround rather funny and clever though 🙂
Jeroen
Hi Jeroen,
Long time no hear. Thanks for the feedback. My workaround was a coincidental find through trial and error which proved useful in making the point. If not confidential, it would be helpful if you could describe a little more the use cases you mention where you ran into this issue, such as the line graphs. This would provide more examples of real-world scenarios in support of the need for a larger result set.
I have raised this idea in the Design Studio Idea Place. It is currently in "Under Review" status, which is hopefully a good sign. Feel free to vote on it 😉 .
Regards,
Mustafa.
Hi Mustafa,
indeed, had quite a history with illness in the family. Luckily those episodes are over now.
Here is a screenshot of a component we have at a customer. It needs to show many KPI's in groups and a line graph with 2 lines per KPI. If you do this per day you can get to up to 80 * 365 * 2 cells that you need.
Best regards,
Jeroen
Hi Jeroen,
Glad to hear all is well now. Thanks for sharing the screenshot. It's certainly a very good example of why we need a larger result set size for SDK components.
Your example also raises another good point about optimisation when it comes to the number of data sources in a Design Studio application, where a single data source includes a large number of KPIs, instead of multiple data sources.
I have seen SCN posts where many data sources are being used ranging from around 10 all the way to even 50 plus in a single application, which does not result in the best performance. I do think that in many cases, especially when related to KPIs, it should be possible to model the back-end such that all the information can be provided in a single data source. In your example, the KPI Number goes as high as 99. It seems like a standardised KPI model has been designed whereby there is either only one cube with all the KPIs, or several cubes each with related KPIs but common dimensions that can be joined into a single view and hence single data source which would work very well from a performance perspective.
Can you elaborate a little about the data modelling approach that was used in your example to allow so many KPIs to be incorporated into a single data source?
Regards,
Mustafa.
off course 🙂
In our case we have a single cube that holds all the KPI's , the KPI's have a navigational item Group. Across the x axis we use calculated keyfigures for current mont, plan, and the dates for the graphs.
Actually with my most recent developments I start to create the graphs in such a way that you can model them independent of the datasource. I just create a 2d table and use d3.nest to group it based on the properties.
We have a lot of downdrill and if you change the datasource you will also influence the high level graphs. using this you can limit to one detailed datasource but group in the graph.
Hi Jeroen,
Thanks for the follow-up. This is making for some very interesting discussion 🙂 . So here are some more questions and comments if you don't mind:
1) I like the approach of having a single standardised cube for holding all the KPIs, which certainly helps with restricting the data source to just one while also allowing a meaningful and performant dashboard to be created:
(a) I am assuming that this is a summary cube that is fed from other cubes which hold the data for different functional areas, such as HR or Finance etc?
(b) Does the cube also include some common dimensions/attributes that can be used for slicing and dicing across all KPIs, such as cost centre, profit centre, account etc?
(c) Do you find that your clients already have such a standardised KPI model in place with one cube, or is this a best practice approach that you implement as part of your client engagements?
2) With the drilldown and grouping, I assume you're using d3.nest to performing client-side aggregations for the groupings. Is that right? Are you limiting your aggregations to summation only or do you have special handling for aggregating averages and percentages as well?
Regards,
Mustafa.
Hi Mustafa,
the datamodel was already in place when I joined so can't take credit for that.
It is indeed a summary cube fed by other cubes, all kinds of values are in there percentages, currency amounts, quantities from HR, Finance, but even from planning applications where users can add their own KPI's.
It does have common attributes for KPI, KPI group and the department (including hierarchy), additionally off course the data dimension. As a user you have a role and a department. Based on that you see a number of KPI's grouped by theme.
It is the clients approach. They developed it over time as they have hundreds of KPI's and each organizational unit has unique ones for their own purposes. It is masterdata driven meaning that scale, unit of measure and evaluation (higher is better) is stored in masterdata.
As the numbers are so diverse i'm not doing any aggregation. I group the values, but I don't add them up.
@ingo
We have some margin when we use 3 components. Additionally we have a max number of kpi's a user can choose to see. That keeps us safe at the moment.
best regards,
Jeroen
Hi Jeroen,
Thanks for sharing a detailed explanation of the KPI framework. This is a very good reference and an example of how Design Studio applications can be optimised by effective modelling of the data on the back-end. This could even make for a very interesting blog topic of it's own 😉 .
Regards,
Mustafa.
so are you - for now - just breaking it into several data sources to get around the 10K cell limit ?
Ingo
didn't go that far. The component has a property where you can set the number of columns and we just used multiple components with their own datasource.
ok - but when you select the number of columns you still could run into the problem relatively easy - or not ?
lets use the screenshot you shared as an example.
The "Title0000" has 4 columns plus the sparkline part. Lets say the sparkline part is 12 months values, so we looking at a total of 16 columns.
But you now would have the problem that the number of rows can grow at any time....
regards
Ingo
Ingo,
In Jeroen's example, the rows represent logical groupings of KPIs. I would not expect an organisation to be constantly changing these groupings so the number of rows should be reasonably stable or at least remain within a manageable range, allowing multiple components to be used for relatively fixed groupings without risk of dynamically exceeding the 10k limit.
Regards,
Mustafa.
that is well understood. all I am saying is that the component is still running the risk to run into the "trap" of 10.000 cells in case there are just more rows being added.
regards
Ingo
Jeroen,
Pretty interesting screenshot. I definitely think it's a legitimate way to see a lot of data at once.
This will sound weird coming from me, but I'm wondering if such a usage given the restrictions imposed on Design Studio SDK (10k cells, dealing with SDK quirks, etc) would be easier to do as a standalone application using BIWS/QaaWS and BIPRWS calls? Maybe that is equally cumbersome for different reasons I guess but it's one way to avoid resultset limits.
Don't get me wrong I think Design Studio is great, but your specific sparkline/scorecard visualization will certainly have a ravenous resultset appetite to pull off 🙂
Yeah, I can see why you would say that and I admit that it also crossed my mind, why not take the last step and make it fully javascript. If you can embed it in the function you can embed it in any html document.
But the component is a tool for a number of designers, and they use it as above, as a menu item (you can set it to text only), as a single table and so on. Also there script methods to switch data, change layout. etc.
I think the reusability, in multiple application in multiple form is the reason I stopped wondering about it. But it is a good alternative.
Jeroen
Interesting phrasing here as this developing for designers pattern I was alluding to in a separate comment thread here. Timely topic for us it would seem!
And yes that makes sense if your solution teeters between a standalone application or being parameterized, reusable component then yeah it could serve either use case (with current data volume restrictions at play, unfortunately!)
I'll have a look at that discussion too
Hi Mike,
Hopefully with discussions like this demonstrating the real world need for large result sets, such as the great example Jeroen has described, we'll see an increase in the SDK limit to allow for such scenarios in the near future 🙂 . I think that as long as SDK developers keep in mind the implications of larger result sets and explicitly design components taking these into account when relevant, a higher SDK result set limit could certainly provide a lot of flexibility for use cases.
Regards,
Mustafa.
I run into this issue from time to time. My hope is that additional data source methods will be introduced that allow for pagination of the data sets. For example, I can query 500 rows at a time and then click between sets to get to the next set of 500 rows.
Crude Examples:
DATASOURCE.getSetSize();
DATASOURCE.getSetRows(limit,offset,sortBy,isAccending);
Math.MOD(dividend,divisor) - modulo operation
This functionality was introduced into the BusinessObjects BI Web Services and I often utilized it within SAP BusinessObjects Xcelsius. I found it to be quite handy when visualizing large sets of tabular data in a Dashboard component.
Hi Jonathan,
Agreed, I too would really like to see the concept of data set pagination with the SDK.
Also, it would be very useful if we had pagination support for large member lists in the data source getMembers() method with wildcard search capability as well. An example script call might look like this:
DATASOURCE.getMembers(dimension, maxNumber, pageDirection, searchString)
where pageDirection could be "previous" or "next" and searchString a string with wildcard support to more specifically narrow down the returned list of members.
Regards,
Mustafa.
I think both these things would be a big enhancement.
In my last developments for SDK i used an function
Intuitive JavaScript array filtering function pt1
where you can add filters (even functions) to limit data.
But much nicer is to have the methods you guys described in standard script
Interesting blog article about array filtering. Thanks for sharing, Jeroen.