The open data movement has been gathering significant traction over the past few years, and is fast becoming a powerful and emerging force in driving transparency and innnovation throughout government.
This movement been endorsed by Tim Berners-Lee, a pioneer ofhttp://www.w3.org/DesignIssues/LinkedData.html open data and transparency. His primary focus is on making local, regional and national data (particularly publicly acquired data) available in a format that allows for direct manipulation by citizens. There is a wide variety of open-source and prioritary data analysis tools available allowing for cross-tabulation, visualization and mapping of datasets.
The primary contention of this movement is that public (and other) data, whether collected directly (e.g. in a census) or indirectly (e.g. crime or accident statistics), should be freely available in electronic form and accessible via the web. The release of such data should conform to the principles of Open Government Data, which include:
- Data must be Complete: All public data – that is not subject to valid privacy, security or privilege limitations – is made available
- Data must be Primary: Data is collected at source, with the highest level of granularity, not in aggregate/modified form
- Data must be Timely: Data is made available as quickly as necessary to preserve its value
- Data must be Accessible: Data is available to the widest range of users for the widest range of purposes
- Data must be Machine processable: Data is structured to allow automated processing
- Data must be Non-discriminatory: Data is available to anyone, with no requirement to register for access
- Data must be Non-proprietary: Data is available in a format over which no entity has exclusive control
- Data must be license-free: Data is not subject to copyright, patent, trademark or trade secret regulation
International organisations such as the World Bank, and Non-governmental organisations such as Oxfam, are also making data available in an effort to increase transparency and openness in aid development. Their efforts center on utilizing their data as a means to ‘foster public ownership, partnership and participation in development from a wide range of stakeholders’.
For governments the release of data to citizens, provides a plaform upon which citizens can create new applications, and thus contribute to the public good (e.g. see the recent Apps for Climate Change initiative, sponsored in part by SAP). One of the most important features of Open Data is its machine readability, and thus the provision to visualise data in powerful new ways. This is crucial to ensuring a more intuitive understanding of data through the easy analysis of trends and exceptions.
In several countries governments and local authorities are starting to release their data for citizen reuse and remix. Manipulation and exploitation of this data will can differ significantly depending on how the data is released and which open data principles it adheres to.
Difference organisations are employing different usability strategies for data presentation. These range from in-house tools which visualise data for users, to more open models where the data is simply made available to download in raw format. The different strategies can be summarised through examples such as:
“In-house” – Visualizations
Example – OECD’s eXplorer:
This application includes a sophisticated visualization interface that supports animations and multilayer maps including Google Maps integration. It allows users to create great looking maps, but the data itself is locked inside, as are citizen-created visualizations (“stories” in eXplorer speak). Just about the only thing you can do is export them as XML files to share, but then others need to go to the eXplorer website to see them.
In general, the system is complicated and lacks flexibility; plus, it’s quite unfriendly for beginners. Making data available only within this viewer maximises the ability to compare data and view trends, but this is at the expense of the non-proprietary and machine readability principles.
Example – Italy’s State Accounting Service:
This site from the Italian finance ministry, provides downloadable databases, with specific instructions for how citizens can create tables summarizing the information contained within. Unfortunately, the instructions are cumbersome, and assume that all citizens use a specific function (pivot tables) of a specific software product – Microsoft Excel. Except for expert users of Excel, this system is “all or nothing”: either you are looking at enormous, unmanageable disaggregated tables or you invest several hours to follow the tutorial and try a few ways to crunch the numbers. It’s ok for researchers, but it does not provide citizens with an easy ability to download and play with the data.
This approach, while adhering to the priciple of machine readabilty does not allow for easy maniplulation and requires a proprietary application in order to make sense of the data.
Example – World Bank Data Catalog
This site provides the best of both worlds, in the sense that it provides ready-made data visualisation tools (including a DataVisualizer), but also allows for data-sets to be easily downloaded in many data formats.
The site data-sets conform to all open-data principles and provides extra tools (e.g. an API) for developers to utilise the site data. The rational for this is – as explained on the site – to: “Broader access to these data allow policymakers and advocacy groups to make better-informed decisions and measure improvements more accurately. They are also valuable tools to support research by journalists, academia and others, broadening understanding of global issues.”
In-Built Data Visualizers or Open Data
The preference for releasing data in open formats was further enhanced with the UK Government publishing detailed spending data for all departments. The data was released in csv format, without any in-house visualisation tools available.
Many have suggested this is exactly what government should do – i.e. provide data in open formats – adhering to open data principles – while leaving others do the visualisations. Following the release of this data the Guardian created an interactive guide detailing some “157 spreadsheets containing every transaction by each one of 24 core departments detailing every item of spending over £25,000”.
Governments are not, however, abdicating their responsibility for creating data visualisations. Barnet Council, in conjunction with the website wheredoesmymoneygo.org, recently created their own infographic in order to represent how they spend taxpayers money. This was done to help inform a debate on how the council could more effectively manage its budget.
While some organisations will undoubtedly create their own data visualisations, the trend looks to be towards large dumps of raw data. As such, there is a real need for citizens and journalists to learn how to use tools such as Protovis, Gapminder, Swivel and Many Eyes.
On the release of UK Government Spending data, Prime Minister David Cameron said:
You’re going to have so much information about what we do, how much of your money we spend doing it and what the outcome is. So use it. Exploit it. Hold us to account.
Visualisation tools, based on open data are the instruments to “exploit it”. The challenge is for citizens and journalists alike is to learn how to use such tools to develop narratives and stories through this vast amount of raw data.