This is a map of Warwickshire County Council’s election results for the 4th of June 2009, built as part of a mini-project to show off the possibilities of our opendata before our Hack Warwickshire competition. (You really should enter you know, you might win an iPad.).
Anyway, for comparison, here’s the result for 2005 – it’s easy to see that Labour lost share over the four years. It’s a strong indication of how Labour generally do well in cities and the Conservatives do better in rural areas – compare with BBC map of 2010 UK General Election results.
I really enjoyed putting this together, despite the stress of not knowing anything about this stuff at the start, and it wasn’t that difficult in the end.
The outlines of the electoral divisions are available from the Ordnance Survey as part of the recently opened-up Boundary Line product, and comes as an ESRI Shapefile, a popular file format for geographic vector information.
The full dataset is 46MB in size, and contains polygons for all of the county councils in England, so I’ll need to chuck a load of that data away. Here’s the full dataset mapped out in open-source GIS tool Quantum GIS – Warwickshire is the thing in yellow in the middle.
I used the command line utilty ogr2ogr from the Geospatial Data Abstraction Library (GDAL), and converted the .SHP file from eastings/northings into latitude/longitude:
…and then converted the resulting file into a .KML:
Each area, called a placemark in the KML, came out of the sausage machine looking something like this, only 108MB of it.
At this point I came to the all-too obvious realisation that the polygons really really were just made out of a shitload of co-ordinates. I knew this would be the case, but I kind of expected it to be something a bit cleverer. I’m not sure what else I could’ve been expecting, really.
So all good so far, but I had no idea which electoral division was which. Alongside the .SHP file was a .DBF file (so dBase, right?) – this can be opened up in Excel. It took me a while to work it out, but the FID number in the KML corresponded to the row number (which is C in the table below) of the area details.
Once I’d chopped out all the non-WCC electoral divisions from the KML file, I was left with about 3MB of data. I took this and ran it through a very simple PHP routine (leaning heavily on the SimpleXML library) to load the data into a MySQL database as field type geometry. The original data was accurate to at least 16 decimal places, which is lovely and all, but overkill on Google Maps, so I took it down to six decimal places, which I think is to within 10cm. Which should be enough.
From there I imported the names of the areas, and with a bit of fiddling, matched those up against the names of the electoral divisions I’d scraped from the original Notes election systems.
I’d had a warning previously that web browsers were a bit rubbish at dealing with the large amount of data it can take to map areas, and I was a bit concerned when I found that the full map of Warwickshire electoral divisions as supplied by the Ordnance Survey contains over 80000 points. It sounded like a lot.
Some rooting around in Stack Overflow brought up a mention of the Douglas-Peucker algorithm, which is a method for simplifying lines. Developer Anthony Cartmell has written an implementation of the algorithm in PHP, and has a demo of it in action too. I used Anthony’s class to simplify the electoral division polygons. Here’s a screenshot showing an example of the line generalisation using Anthony’s class on the Aston Cantlow electoral division – I’ve chosen badly with the colours on the map, the pink line has 813 points, and the yellow line has 74 points.
I also wrote a routine to export the whole dataset as a simplified KML file – this is at setting 3000, which results in a 176 KB KML file, and 6647 points (just over 8% the size of the full version):
Here’s a link to the KML in Google Maps, with 6647 points – notice on the left that the areas aren’t labelled yet.
After all that, I did an export of the dataset at 99% of the points in the full version, resulting in a 1.8 MB KML (view on Google Map) – which still works quickly in Mac OS X Safari, and was actually OK in IE7 too. I was interested that IE could cope with this many points – it made me wonder if something was being done Google server side to smooth out the points at a particular zoom level.
I also wrote a variation on this to output back to the database rather than out as a KML file – this meant I could quickly experiment with different generalisations of the data to check performance.
Once I’d finally finished fiddling and settled on a generalisation I was happy with (around 8000 points for the entire set), I built up a single GPolygon using the Google Maps API on the results page map, and coloured it in with the winning party colour:
For the main map it was just a matter of building up all the GPolygons for all the areas, and listeners to add pop-up bubbles when clicking on an area.
…and we’re done. Performance is presumably dependent on Javascript performance – it’s definitely fastest in the WebKit-based browsers, as you’d expect, with Firefox 3 being ok, and IE7/8 being very slow. Chrome was actually usable with the full 80,000 points.
Surprisingly, Opera 10 throws a psychedelic fit, refusing to remove previous shapes as you zoom in, which is pretty but a bit rubbish.
All I have for mobile testing is my much loved 1st gen iPod Touch, which crawls but does work, including pinch zoom, interestingly. I’d like to see it on an iPhone 3GS.
Just to compare the experience, here’s the map running within Chrome 4 on a Windows 7 x64 VM on my Macbook.
And here’s the same map in Internet Explorer 8.
I had high hopes for the IE9 preview, which is said to boast much better Javascript performance and SVG, but in my testing it didn’t seem much quicker than IE8.
There are alternatives to building the polygons client-side, but with the experiments I’m doing right now, I’m trying to keep things as simple as possible, running outside of the corporate network using my cheap shared host and free web services to get a quick leg-up and show what can be done without spending a fortune. The shapes could be built server-side, which given a big enough server – however big that is – would be more usable across a wider range of browsers. This would need infrastructure putting in place, and I’m too cheap to get myself a VPS or dedicated server at the moment. Also building the polygons client-side also gives you a level of interactivity (…alright, they’re clickable) which has further possibilities.
So for now, this method works for simple shapes, but for presenting more complicated outlines to a general audience, I’m hoping the future will catch up with us and Microsoft release a blindingly quick version of IE9 which somehow automatically replaces all previous versions in a flash.
Yeah, that’s gonna happen.
But if not, I’ll get round to looking at GeoServer. I noted that KML files displayed quickly when viewed directly in Google Maps in IE7 – this could also be something to explore.
(I should say that this page uses Ordnance Survey data © Crown copyright and database right 2010.)
]]>Initially it was just meant to be an excuse to fiddle with the Google Maps API, but I started having a play with the online automatic tagging service OpenCalais, which ended up being the most satisfying thing about it. I’ve left all the tags and types produced by OpenCalais in so you can compare the tags against the content.
OpenCalais is actually pretty good, despite my previous churlish Twitter whinging. It seems a bit petty to pick holes given that it’s a free service provided kindly by Thomson Reuters, but well, I’m going to anyway…
Most of my problems with it are to do with the categorisation of the tags – for instance it seems to be pretty good at pulling out names, with some exceptions – for example, Lea Marston and Leek Wootton are places rather than people, and Warwickshire is tagged as a City rather than a Province or State, although it correctly works out that North Warwickshire fits into the latter category.
It could do better with working out synonyms – for instance anti-virus software and antivirus software are the same thing, and I remember seeing a couple of places where the plural and the singular are included as tags.
For some reason I was impressed that it knows that Come Dine With Me is a TV show, and that the communications team write so much about programming languages. The latter is one case where you would possibly post-process the tags found, in this case by chucking them away.
OpenCalais doesn’t seem so hot on working out a more general keyword behind a story – the tagging on the story Civil partnerships and marriage increase didn’t pull out the words marriage or wedding as tags.
(Update: See comment from Tom Tague of OpenCalais for clarification on the way that OC works).
For me the best thing to come out of the tagging was the “possibly related stories” sidebar in the news story page, which I added late on. When you open up a news story, it searches the database for the top 5 stories with the most tags matching that story, and mostly this works pretty well – possibly because of the robotic tagging consistency of OpenCalais.
On the technical front, the site is based on the usual PHP/MySQL combination, and I used the open-source CodeIgniter framework, with the Simple DOM Parser for scraping the news stories, and Dan Grossman’s Open Calais Tags library to send the main body text off to OpenCalais for processing. The elapsed time to process each piece of content was generally about 3-4 seconds, sometimes slightly shorter, sometimes longer (up to around 10 seconds). I had to run the routine several times to get results for all four thousand indexed stories – there was a memory leak somewhere along the way.
As for the thing that I initially started out to do, that’s pretty dull really – I used the Google Maps API to geocode a list of towns and villages in Warwickshire, as well as few other places, and then ran the main text of the stories through a simple regular expression search to tag them up with places.
There’s lots of improvements that could be made, but in the end it’s just a throwaway experiment. I’d like to improve the places tagging routine, which could be as simple as adding a few more places. The main thing would be to look into some way of fitting the tags around a pre-defined ontology. There’s no current method to suggest a list of categories and tags for OpenCalais to process content with, so it would be have to be after the results had been received.
]]>