Last week, the Obama administration launched the much anticipated data.gov website. The site allows citizens to download dozens of raw data files from various federal agencies. Over the coming weeks and months tens of thousands of new data sets will be added to the site, to create a comprehensive platform for data discovery, participation and engagement. The idea is to encourage programmers and others to make new applications, mashups and visualizations based on this data. However, in order to interrogate the data and uncover meaningful trends, business intelligence and analytical tools are necessary.
Data analysis: The public knows best
The obvious question is why these datasets are being made available without context or visualisation. Making millions of lines of raw data available on a web site is not necessarily transparency. Effective transparency requires data to be provided in ways that can be understood and acted upon. The government, however, is relying on the public to contextualise the data and narrate their own stories based on this.
Clay Johnson from the Sunlight Foundation makes some good points about why visualisations of the data should be left to the public.
- He believes that providing a centralized repository of government data in machine readable formats is a hard enough problem for government to solve. Therefore, government should concentrate its efforts on making the raw data available in open and accessible formats (e.g. XML, KML and CSV), rather than trying to incorporate charts and graphical illustrations into the site. People and other third parties (e.g. newspapers or advocacy groups), will probably do a better and more interesting job of data analysis. He believes “external entities will always give the data more exposure and treatment than government can”.
- Adding visualizations to the data may actually reduce transparency. Government should prioritize data completness and accuracy, over and above data representations and user experiences.
The provision of raw open data is one of the top social media memes changing government. The old tradition of government churning data into tables and reports is being overtaken by a more dynamic process. This involves data being released openly to developers who can then unleash it to tell all kinds of stories. The growth of free online analytical tools and cloud computing services, now enables the public to easily produce in-depth analysis of large datasets. It empowers them to tell their own narratives – whether biased or otherwise – based on raw government data.
Using BusinessObjects Explorer ondemand for data analysis
With the development of data.gov I decided to try out some of these tools to see how BusinessObjects Explorer onDemand compared. Unfortunately, many of the csv files on Data.gov are greater than the 5MB BusinessObjects Explorer file limit, and so I used a dataset from the District of Columbia’s data catalog. This data catalog was used as the inspiration for Data.gov as the US Federal CIO Vivek Kundra was previously the CTO for the District of Columbia.
The dataset I used related to Purchase card transactions for calendar year 2009. This is a relatively common set of data that many organisations analyse on a frequent basis. I uploaded the data file (.csv) and used Explorer to create the analysis shown below.
(Analysis of Purchase Card data from District of Columbia in BusinessObjects Explorer)
Extra functionality for BusinessObjects Explorer
The onDemand application of BusinessObjects Explorer gives a good indication of the product functionality. While I appreciate BusinessObjects Explorer is not touted as a consumer based Web2.0 application, it could be improved to include extra functionality exemplified in other applications such as Swivel and Google Spreadsheet. Given that more and more raw data is going to be released by governments around the world it would be useful for SAP to develop a more consumer centric onDemand BI application. This could compete with servces such as Swivel and Google.
I uploaded the same data file to Google Spreadsheet and Swivel in order to compare some of their features. One of the most useful applications offered by Google Spreadsheets for data analysis is the Motion Chart gaget. It allows you to see trends over time, and is a powerful means of quickly analysing large volumes of data (see the motion chart for this data at Google).
From an comparison of Swivel and Google features, I have listed some improvements to BusinessObjects Explorer below, which would make it a more useful consumer BI tool:
- Allow datasets greater than 5MB to be analysed. Many of the data.gov files contain 10-20MB of data and consequently cannot be analysed using BusinessObjects explorer.
- Allow for Explorer analysis of data to be easily shared. Swivel and Google allow links of data analysis to be emailed or shared on blogs. Allowing Explorer dashboards to be embedded, or shared would greatly increase the visibility of the tool.
- Enable mapping functionality in Explorer. Tools such as Xcelsius allow for mapping webservices to be integrated. Also, Google and Swivel enable maps to be easily created from datasets. This would be a useful feature in Explorer, as many government datasets include geographical information.
- Allow third party add-ons to be created and included within Explorer e.g. similar to Google gagets. This could usher in new features such as motion charts etc. that create powerful visualizations of large data volumes.
BI OnDemand strategy
One of the questions is whether Explorer onDemand is seeking to be a consumer Web 2.0 analysis tool, or whether it is seen mearly as a demo product for the corporate version. If the objective is the latter, then the recommendations above are probably not realistic. However, a more interesting business model might be to develop these features, and offer Explorer to the public as a feature rich Web 2.0 Business Intelligence application. This would increase its profile, and could entice businesses to investigate the corporate version if they are satisfied with the consumer product. The freemium business model works in this way, and perhaps SAP should utilise this as part of their overall BI onDemand strategy.
The demand for extensive cloud based analytical solutions is increasing, as more and more governments and organisations crowdsource data analytics. If SAP sees itself as a major player in this market, it needs to create a strong consumer product that demonstrates the BusinessObjects capabilities. Explorer onDemand already does this, but its feature set and usability needs to improve to compete with other comparable Web 2.0 products.