The trend towards open government data is growing, with recent developments like Data.Gov and Vancouver’s Open3 motion, but these simply do not go far enough. In addition to publishing downloadable data and open interfaces, government needs to learn from successful commercial websites and bring their “Deep Web” data to the surface.
The internet search experience is constantly evolving. In the early days it was normal to search for a single keyword, be redirected to an authoritative website, and then explore that site to find what you were really looking for. As the search engines became smarter and publishers learned to expose their database records as individual web pages, people have learned to search for more specific information. For instance, searching for the name of a book will take you to an Amazon or Wikipedia entry for that book. Searching for the name of a current release movie will get you local show times and the name of the theatre it’s playing at.
Unfortunately, government has largely failed to recognise this change, and an entrenched tendency to develop stateful applications and portals is making the problem worse. As an example, try searching for US patent number 6368227. You will likely find a few results from ad-driven private websites that were re-publishing the government data, and maybe a broken link to the official patent search. Why bother publishing your information online if you are going to do so in a way that holds it apart from the web?
The good news is that fixing this problem is not hard. The search engines already assign a lot of authority to government sites, so you’re already a step ahead of commercial sites facing the same problem. Just follow a few simple suggestions that the rest of the web has already figured out for us:
- Publish each well-formatted record to a consistent location on the web that will not change. This allows both people and search engines to come back to these records whenever they want.
- Ensure that search engine spiders have a way of following basic hyperlinks to find this information. This can be either a simple paged set of results, or a more complex hierarchical system if the data allows.
- Generate SiteMaps that link to all of your records as cheap insurance to make sure that the search engines can find all of your content. Be careful to pay attention to the maximum size, and break your data up into multiple sitemaps if necessary.
- Make use of clear and logical metadata such as Title, Description and Canonical tags to ensure that both search engines and your prospective searchers can make sense of the results. Nothing worse than publishing a record with an HTML title like “32432-43A”. Nobody is going to click on that!
Do we still need to build applications? Absolutely! Sometimes free text search across the entire web does not offer enough granularity. Do we still need to make data and services available to third parties? Definitely! There are lots of smart people out there who can use our data to help make the world a better place. However, these are secondary to the single most effective way we have of giving citizens access to the data we maintain on their behalf. Our highest level of service can be delivered by being of the web, not just on the web.
Oh, and since this is a geospatial blog: Just because your data is in a GIS, don’t think you can avoid doing something about this. Spatial search is still nowhere near as powerful as general web search, but it’s getting better all the time. Government geodata needs to be published as web pages too.
For some concrete examples of the benefits of becoming part of the web, check out a slide show that I recently published as “Moving Beyond the Desk“. Make sure to turn on the speaker notes. If you don’t feel like watching the slides, just try searching Google for Mark Bate Statue or 2323 Rosstown Rd and see if you can find the City of Nanaimo’s data in the results. For technical information on the systems behind these results, see my previous posts on the public art project and the MapGuide GeoREST extension the City is using to publish property information.
-J
P.S. This post was prompted by James’ mention of the Moving Beyond the Desk slide show. Thanks James!