After having read lots about Chris Taggart’s Open Election Data project, I thought it’d be fun – yes, really – to try and have a go at the Warwickshire County Council election data, which resides within a series of Lotus Notes databases on the WCC site. Here’s what I ended up with, Warwickshire Election Data.

Due to a lack of Notes developer resources, I wasn’t going have the data exported out of the database itself – doing so would require setting up new views – so I went for the even more direct route and scraped the data for the council elections of 2001, 2005 and 2009 into a MySQL database. Actually, who I am kidding, I like scraping data, despite the mental torture. It has the feeling of something that you shouldn’t be able to do, and I like the idea of being able to turn flat data into something more linked up.

2005 and 2009 weren’t so bad, because all the information was in a view, and using the ?readviewentries switch in the Notes view URLs I had some sort of XML representation which I could process. Here’s an example of the view, from 2005:

WCC Election result view from Lotus Notes database, 2005

The name, party and votes received for a particular candidate were all in one XML element, but in a predictable format so it wasn’t such a problem to parse that out. In the end it was just over an afternoons work to grab all of 2005 and 2009.

2001 was trickier, because the information I needed was in a plain, old-school HTML page, as a table with no classes or IDs to identify the information within. It ended up taking me a couple of days (yes, that’s how slow I am) to write something that would reliably scan through the list of ward results, get the HTML for each ward result page, find the particular (unmarked) table on within that markup and scan through it to create the people, results, divisions, and candidates. Here’s an example of the result for Arley:

Arley electoral division result - WCC 2001

I was hoping to use the power (makes face) of relational data, so before adding a person to the database, I checked to see if they were already included – if they were, I returned the reference for that person and used that. This way, we can see where people might have stood for election – here’s a good example, Janet Alison Alty, who stood for election three times for the Green Party in different wards across the elections.

The graphs are straight out of the Google documentation for their interactive charts – here’s an example of a 3-D pie chart from the Bedworth North 2005 election result page:

Example of a pie chart, this one is from the Bedworth North 2005 election result page

I really wanted to use a jQuery-based charting library (for example Flot, jqPlot) but on a quick glance they weren’t quite sharp enough looking for me, pretentious aesthete that I am. I need to come back to this in the future. I could quite happily graph the arse off this data, it’s just a question of spending the time working out the SQL queries to bring back something useful.

Something I’ve never tried before but heard mentioned in dispatches is scraping Google – I used a quick scrape of the first result for a search on the Warwickshire Web to get a list of pages describing the electoral divisions, which include the number of registered electors, wards and parishes, last election date and other stuff. As far as I can tell the result pages on the Warwickshire Web aren’t linked to these pages, so this is an extra relationship in this site. The link appears as “WCC page for (division name)” on each division page – here’s an example for the Admirals division.

I don’t know how impressed Google would ever be with this tactic, but as the volume was relatively low (less than a hundred) I didn’t think they would too upset. When I was trying to get the query right, I messed it up slightly and ended with a result from a well-known far right forum discussing a particular election – reading the posts was a strange view into another frightening world.

This isn’t anywhere near done (…to death), there’s lots more that could still be done with the data, and even more now that Ordnance Survey have released their boundary information, which covers electoral ward and divisions.

The whole thing took just over a week, inbetween chomping on biscuits and wiping a baby’s bum, using the usual PHP/MySQL, CodeIgniter, and the Simple DOM Parser library for scraping purposes, which I used before for my attempt to scrape/map the WCC news stories.

And the last thing – almost forgot – the election area is marked up as linked data as per the Open Election Data project, which feels good. I’m not sure if I should submit it, to be honest, although it’s tempting. I’ve checked it out using the W3C Semantic Web parser, and as far as my bleary eyes can tell, it looks ok. We’re missing some data that’s present on the example page from Lichfield DC – electorate, ballot papers issue, and number of spoiled ballots, but that might not be a problem.

Hopefully in the next week or two we’ll be making the raw election data available as a CSV file or something, from the Warwickshire Open Data site which will be launching soon, once we’ve added a few more datasets – there’s some interesting data from schools to come in the first batch. There’s an inevitable blog about all that at warwickshireopendata.wordpress.com.