ActiveBlog

Talking Vantrash.ca: A Case Study of how Open Data Needs Open Source
by Eric Promislow

Eric Promislow, April 12, 2010

"Open Data" has been the flavor of the year. While ten years ago most people were content to access their favorite government-issued pamphlets and documents as PDF documents on the web, a few people knew that there were untold applications locked within the encrypted data in those documents. In the past year or so, many jurisdictions, most notably at the city level, have been releasing sets of raw data. Now that we're starting to get access to new sources of data, in the form of RSS fields, geographical data, and bit dumps from other kinds of databases, the next question is, what do we do with it, and how? This article is taking a look at one particular instance, vantrash.ca, and how it's using public data to save people's marriages. You see, I don't know how it is in other places, but dealing with residential garbage in Vancouver, B.C., is a bit of a pain. Maybe this was covered by one of the visiting broadcasters during the recent Olympics, but I must have missed it, and I'll assume you're the type who would rather read blogs than watch network TV, so here's a quick explanation on Vancouver's trash pickup system. You have a weekly garbage pickup date, until a statutory holiday comes along, and then your pickup date skips ahead by one workday. For example, during the first three stat-free months of winter, my pickup is currently Monday. That's easy to deal with -- I put out the recycling and garbage sometime Sunday, and we have a clean recycling box to start refilling Monday evening. But then along comes Easter, and afterwards everyone's garbage pickup day advances one day, unless your pickup day was Friday, and then it will be Monday. But wait, Easter Monday is also a stat, so that means you advance two days. The last week of the year can be even crazier, with a three-day jump due to Christmas, Boxing Day, and New Year's. Combine that with a 6:30 AM pickup time, and you're going to find some perfectly good mornings ruined by the sound of the garbage trucks zooming past your house as there wasn't anything put out for pickup. You can always throw on some clothes, zoom out of the house, toss the recycling and the garbage can in the minivan, and go racing after the trucks, but that will just raise the ire of your neighbors, when they think you've used up your garbage quota and are trying to horn in on theirs. Plus the kids will complain about the smell in the van.

Show me the Data

Shift the scene from a dark wet winter morning looking at the tail lights of a garbage truck roaring down a back alley, to the warm, dry hackers' hangout at the Vancouver City Archives. A bunch of local hackers have arranged to meet with a group of City workers who have released some new data sets, and are looking for feedback. The mission now is to find an interesting dataset, apply a bit of code, and whip up an application in three hours. I went there one evening, and if you have a look at Vancouver's Open Data Catalogue, you'll see how many different data sets there are, ranging from a list of city alleyways and bikeways down to city-run webcams and zoning boundaries. Most of the datasets have a geographical aspect to them, not surprising considering the source. Most of the geographical datasets ship in three formats: KML, SHP, and DWG. You can use KML files right off the bat with an online map viewing utility like Google Maps or Bing. For example, from the catalog I see that the URL for the list of webcam data is http://data.vancouver.ca/download/kml/webcams.kml, but that file isn't so interesting to look at in raw form. If you open up Google Maps like so: http://maps.google.com/maps?q=http://gisweb2.vancouver.ca/google/kml/webcams.kml, you'll see this much more interesting view:

So right off the bat we can build trivial web applications with data like this. There's a lot of open data in this application, but not much open source. Let's bring in some code. The classic example is vantrash.ca by Luke Closs and Kevin Jones, two local Vancouver programmers who are always looking for an interesting project using, typically, open-source technologies. They had read David Eaves post How Open Data even makes Garbage collection sexier, easier and cheaper, got the data, scraped a couple of other city sites to get the pickup times for each zone throughout the year, and whipped together an alpha in Perl. Users go to the site, click on the zone they live in and enter their email address. The site figures out which zone they live in, and they then get a weekly email reminder the day before the pickup date for that week.

Perl to the Rescue

The interesting part of this article, is why they selected Perl. True, they both had been writing large amounts of code in the last few years in Perl, but there are other reasons. And if you've read articles like this before, you can guess that CPAN is part of the answer. Notice those two other columns next to "KML" in the data column: "DWG" and "SHP". A couple of minutes at Wikipedia told me that DWG files were a type of AutoCAD format, and SHP files, also called shapefiles, were an open specification for GIS systems. From that, knowing nothing else, I knew which format I'd rather work with, then went to CPAN, and downloaded Geo::Shapefile. I then filled in the documentation by firing up an interactive Perl shell, typing in commands, and getting back large numbers of polynomials. This is basically the process Luke and Kevin went through, taking vantrash.ca from concept through alpha to release after a few evenings and weekends of coding. Calculating a user's zone based on his location is also a typical problem from computation geometry, easily solved by downloading the Math::Polygon module, feeding it the data, and using the contains(POINT) method to determine which zone a point is in. There are no doubt other ways to solve the problem, and solving it isn't a strength peculiar to Perl. But this is the kind of problem Perl and CPAN make easy to solve.

Show me the Code

My main takeaway from the hackers' meetup (it's not a hackathon if it's only three hours long) is that while open government data is better than the alternative, it doesn't go far if we don't have good ideas to do something with it, and programmers to build the apps. I can see that apps will end up targeting proprietary platforms like the iPhone. But to build them, quickly, they'll need code to deal with formats like SHP. The first hit for "shapefile library iphone" landed on a dubious "wareseeker" site, the second took me to site where I had to register for an API key to use their free library. While this might be a legitimate step for a production app for a mobile device, I appreciate how quickly I can start working with new file formats in a language like Perl. In this past week, I've heard that both the British Columbia and Edmonton (Alberta) governments are hosting "app-building" contests, seeing what people can do with their datasets. I'll have a closer look at these in a later post, but I'm expecting that we'll be seeing more solutions built with languages like Perl, Python, Ruby, and JavaScript, than with the lower-level languages like Java and C that were most likely used to compile the data. Is your government jumping on this bandwagon too? Are the released sets full of interesting potential, or run-of-the-mill? And what exactly constitutes an interesting open-data app anyway? Unlike vantrash.ca, most of the data catalogue looks like you could build a good app for the construction and development industries. Most of the people I met at the Hackathon were interested primarily in freeing up the data that the public owns anyway, but were also at a loss at how to turn datasets like lists of sewer mains into an interesting app (or they weren't telling me). Are there any vantrash's—apps with wide-appeal—in your area?

Subscribe to ActiveState Blogs by Email

Share this post:

Category: ActiveBlog, open source
About the Author: RSS

Eric Promislow is a senior developer who's worked on Komodo since the very beginning. He has a M.Sc. in Computing Science from Queen's University and a B.Sc. in Biophysics from the University of Ontario. Before joining ActiveState, he helped create the OmniMark text-processing language.