How to talk to your business data

Kay_Kadner · ‎11-22-2012

The Challenge

Data warehouses are an important information source for decision making and controlling. Thanks to SAP HANA, even huge data warehouses can be analyzed very fast to answer business questions within a fraction of a second. When combined with tools like SAP Business Objects Explorer, this enables casual users to navigate interactively inside complex reports or dashboards, e.g. by interactive filtering or drill-down in a user-friendly way.

However, despite tremendous progress in managing, analyzing and visualizing data, interactive data exploration is hampered by the fact that business analytics is often based on pre-canned queries, typically provided by company’s IT-departments. This is because Business Intelligence (BI) self-service tools often require a lot of technical insider know-how about the highly structured content such as an understanding of technical vocabulary or knowledge about the data warehouse schema. Others like SAP Business Objects Web Intelligence or Explorer require some effort in configuring your actual query. In contrast, business users are familiar with unstructured query interfaces (like the Google search field) or, more recently, with question answering systems such as WolframAlpha or the voice-enabled answering system Siri from Apple. The challenge thus is to bring the paradigm of keyword search and natural language input to the data warehouse world to allow a free navigation inside data.

The Search 360 Solution

Users of Search 360 just enter or speak a query and the system identifies the actual intention. Sounds easy, but how could it be done? The Realtime Intelligence Economy program of SAP Global Research and Business Incubation has developed a prototype called Search 360 that supports exactly that.

The Search 360 system is based on SAP Netweaver Cloud technology. It is connected to databases via Business Objects universes that provide a semantic view (aka meta data) on the underlying data. In particular, the available analytical measures and dimensions can be extracted, among other information. If the user asks a question, the system identifies all measures and dimensions in the question as well as possible data instances. Instance data is leveraged to add filters on the database queries and contextualize the results. For matching the user’s question and the database content, the system transforms the universe meta data into a graph representation, which in turn is queried to make sense of the user’s question. Since user input might be incomplete or ambiguous, often multiple possible results will be retrieved for a single question. Nonetheless, and thanks to SAP HANA, processing the actual database queries represents a minor part of the overall runtime from user input to the system answer. Eventually, the query results are transformed into readable tables and, if applicable, tailored charts are generated and presented to the user.

As an example, the user question “Revenue per year in San Francisco” contains the measure “revenue”, the dimension “year”, and a filter on the dimension city (“San Francisco”). Based on that, SQL queries like the following will be created:

SELECT SUM("REVENUE") AS "SALES REVENUE", "YEAR" AS "YEAR" FROM "KEY_PERFORMANCE_INDICATORS" WHERE ORT01 = 'San Francisco' OR ORT01 = 'SAN FRANCISCO' GROUP BY("YEAR") ORDER BY "YEAR" ASC LIMIT 1000

Figure 1 Exact (left) and alternative result (right) for the query „Revenue per year in San Francisco“

One corresponding answer to the question based on sample data is shown in Figure 1 (left). However, the system usually returns and ranks possible results which might not perfectly fit the entered query but which represent related information. In this example, the system also returns the result shown in Figure 1 (right), which is basically the same but without the restriction to San Francisco, i.e., for the whole company. This allows the user to quickly compare the revenue development of San Francisco with the company itself.

Note that these results are not pre-computed but created on-the-fly as the user asks a question. This decreases overhead for storing and maintaining aggregates but, most importantly, it ensures insights based on realtime data.

The Search 360 system has a couple of other nice features: Since a user question might not be complete, the system can guess missing information. For instance, the user might ask “How are the different cities doing?” The system proposes measures that make sense in the context of cities and does the querying based on that. This leads to a multitude of results, because one result for every possible combination is created. In this example, a total of six answers and charts are returned, because six measures are available in the sample data, including sales revenue, margin, quantity sold, and discount. The user can then himself decide which answer helps him most. Further features include natural language processing like asking for “stores with revenue higher than 3000000” or “top 5 stores with respect to revenue”.

The Outcome

Overall, two prototypical user interfaces have been developed, which are being evaluated with customers in a pre-productization phase. The web-based user interface comprises a search input field and allows a Google-like experience of searching BI data (Figure 2). It runs in a browser and is therefore also suitable for tablets like the iPad, for instance. Since the Search 360 prototype was intended to give a 360 degrees insight, it searches not only the corporate BI data. It implements also a federated search by integrating search results from other sources like Google, Yahoo, Bing, Freebase, WolframAlpha, and Google Maps. Further sources like a corporate address book can easily be integrated due to a flexible plugin architecture. A convenient way of entering queries is also included: the system shows suggestions of valid input and thus allows the user to multi-autocomplete his query.

Figure 2 Screenshot of the desktop version

Having the basic system up and running, we also thought about the mobile user, who wants to literally get answers to his BI questions. Therefore, we developed an iPhone app that can be used in a Siri-like manner. The user speaks his question and the app reads the result to the user if it’s not too complex. The screenshot in Figure 3 shows how the iPhone app looks like. The user asked his question after pressing the “Search 360” button on the bottom. The iPhone reads the answer (the text above the chart) to him and then awaits further instructions. Basic navigation between multiple results is also available in this proof of concept.

Figure 3 Screenshot of Search 360 mobile

In addition to that, we currently work on an implementation directly in SAP HANA. We use HANA-specific programming models and the SAP HANA built-in search capabilities. Therefore, this implementation is even faster since we directly operate physically side-by-side to the data.

Conclusion

The prototypes show that exploratory access to databases by casual users is possible. Users don’t need to have detailed knowledge about the underlying structure. They can use natural language as they would do for unstructured information systems and the results are visualized based on the user’s intent. Both desktop and mobile users benefit from this technology.

If you like to get more information about the prototype or would like to try it out on your own, don’t hesitate to contact Kay Kadner. Access to a demo system can be made available.