BI and Search
Internet search engines changed people’s expectations concerning information access, which in turn impacted people’s expectations for BI access. I see three business drivers for the convergence of BI and search.
The first is making previously run reports in BI repositories easier to find. The ability of search engines to deliver useful results in a few seconds based only on a keyword has led business users to question why BI tools require them already to know what report holds the information they want. Keyword searches against titles, row and column headers and other report text can lower training costs and increase user adoption as organizations broaden BI access to more people throughout the enterprise. The list of returned results can be ranked and sorted by relevance to help the user quickly find the right information. Searching BI repositories also can help IT organizations in their data governance initiatives by providing an inventory of reports that contain sensitive information that should be available only to authorized personnel.
A second, related business driver is to be able to quickly get the answer to a specific question. For example, a search string of sales, forecast, variance%, colas, east, Q1 would return a value such as “-10%,” instead of the link to the sales report for the first quarter. Some basic capabilities in this area exist, but the user must know how to structure the search phase properly. While this type of natural-language query technology is still maturing, vendors are continuing to conduct research into it. Making it easier for end users to create their own ad-hoc queries can increase business productivity and reduce IT report development workloads.
The third driver for the convergence of BI and search is to provide business contexts for decision-making. Users need to understand the larger context that is causing trends and variances in their BI numbers. For example, responses to declining sales of a product likely will be different if, on one hand, a competitor has introduced a new product with enhanced functionality or, on the other, the competitor has started heavily discounting the price of a current product. In order to determine the proper context, organizations need to link competitors’ news releases and internal competitive information from sales debriefing interviews – that is, unstructured content – with the structured BI data. This functionality can help companies respond faster to changing market conditions and create action plans that more effectively reduce threats and help capitalize on opportunities.
But this also needs a serious effort from BI side - currently the metadata quality in a lot of BI systems are so poor, that you cannot write a decent query in a programming language - let alone have a natural language query answered.
Also, there is a question of should searches go against existing reports and match titles and column headers OR should they go to underlying datamodel and interpret an adhoc query and fetch a result. The first option exists already in some fashion - but has very limited utility. The second option is more sophisticated, and not easy at all. And inefficient design of metadata makes it harder for this to materialize.
I agree that the bigger business benefit is to search against the underlying data model not report titles and headers. And as you indicate the semantic layer is a critical enabler for this.
From my perspective most BI metadata is limited in specifying the relationships between data elements including abstractions and containers. There needs to be more work around building taxonomies with an inheritance hierarchy and ontologies with formal business rules that relate data elements to each other so the search engine can understand the relationships between syntactic form and semantic class information and dynamically map search phrases across data source with different schemas.
Of course the increased specificity and precision required to enable ad-hoc natural language query also adds significantly to the complexity of the system and the cost to build and maintain the semantic layer.
There are other issues that also have to be dealt with such as how the search engine deals with words that can have multiple meanings (polysemy). For example ‘State’ could be a state in a country, a customer’s last name, or the state of a current business process. The search engine has to be able to infer the correct meaning based on the context of other words in the search string if it is going to return the correct answer.
It will be interested to see how things develop.
Regards,
Dan