Show me the Data
Shift the scene from a dark wet winter morning looking at the tail lights of a garbage truck roaring down a back alley, to the warm, dry hackers’ hangout at the Vancouver City Archives. A bunch of local hackers have arranged to meet with a group of City workers who have released some new data sets, and are looking for feedback. The mission now is to find an interesting dataset, apply a bit of code, and whip up an application in three hours. I went there one evening, and if you have a look at Vancouver’s Open Data Catalogue, you’ll see how many different data sets there are, ranging from a list of city alleyways and bikeways down to city-run webcams and zoning boundaries. Most of the datasets have a geographical aspect to them, not surprising considering the source. Most of the geographical datasets ship in three formats: KML, SHP, and DWG. You can use KML files right off the bat with an online map viewing utility like Google Maps or Bing. For example, from the catalog I see that the URL for the list of webcam data is http://data.vancouver.ca/download/kml/webcams.kml, but that file isn’t so interesting to look at in raw form. If you open up Google Maps like so: http://maps.google.com/maps?q=http://gisweb2.vancouver.ca/google/kml/webcams.kml, you’ll see a much more interesting view.
So right off the bat we can build trivial web applications with data like this. There’s a lot of open data in this application, but not much open source. Let’s bring in some code. The classic example is vantrash.ca by Luke Closs and Kevin Jones, two local Vancouver programmers who are always looking for an interesting project using, typically, open-source technologies. They had read David Eaves post How Open Data even makes Garbage collection sexier, easier and cheaper, got the data, scraped a couple of other city sites to get the pickup times for each zone throughout the year, and whipped together an alpha in Perl. Users go to the site, click on the zone they live in and enter their email address. The site figures out which zone they live in, and they then get a weekly email reminder the day before the pickup date for that week.
Perl to the Rescue
The interesting part of this article, is why they selected Perl. True, they both had been writing large amounts of code in the last few years in Perl, but there are other reasons. And if you’ve read articles like this before, you can guess that CPAN is part of the answer. Notice those two other columns next to “KML” in the data column: “DWG” and “SHP”. A couple of minutes at Wikipedia told me that DWG files were a type of AutoCAD format, and SHP files, also called shapefiles, were an open specification for GIS systems. From that, knowing nothing else, I knew which format I’d rather work with, then went to CPAN, and downloaded Geo::Shapefile. I then filled in the documentation by firing up an interactive Perl shell, typing in commands, and getting back large numbers of polynomials. This is basically the process Luke and Kevin went through, taking vantrash.ca from concept through alpha to release after a few evenings and weekends of coding. Calculating a user’s zone based on his location is also a typical problem from computation geometry, easily solved by downloading the Math::Polygon module, feeding it the data, and using the contains(POINT) method to determine which zone a point is in. There are no doubt other ways to solve the problem, and solving it isn’t a strength peculiar to Perl. But this is the kind of problem Perl and CPAN make easy to solve.
Show me the Code
Title image courtesy of Jilbert Ebrahimi on Unsplash.