<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Made of String &#187; googlemaps</title>
	<atom:link href="http://madeofstring.co.uk/tag/googlemaps/feed/" rel="self" type="application/rss+xml" />
	<link>http://madeofstring.co.uk</link>
	<description>Still not a very good programmer despite all that tea</description>
	<lastBuildDate>Sun, 29 Jan 2012 21:29:09 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Mapping our election data in Google Maps</title>
		<link>http://madeofstring.co.uk/article/mapping-our-election-data-in-google-maps/</link>
		<comments>http://madeofstring.co.uk/article/mapping-our-election-data-in-google-maps/#comments</comments>
		<pubDate>Fri, 21 May 2010 21:52:53 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
				<category><![CDATA[Article]]></category>
		<category><![CDATA[election]]></category>
		<category><![CDATA[googlemaps]]></category>
		<category><![CDATA[maps]]></category>
		<category><![CDATA[opendata]]></category>

		<guid isPermaLink="false">http://madeofstring.co.uk/?p=158</guid>
		<description><![CDATA[
This is a map of Warwickshire County Council&#8217;s election results for the 4th of June 2009, built as part of a mini-project to show off the possibilities of our opendata before our Hack Warwickshire competition. (You really should enter you know, you might win an iPad.). 
Anyway, for comparison, here&#8217;s the result for 2005 &#8211; [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://madeofstring.co.uk/electiondata/elections/1"><img src="http://madeofstring.co.uk/wp-content/uploads/2010/05/warksElectionMap.png" alt="Warwickshire County Council Election Map - " title="warksElectionMap.png" border="0" width="498" height="436" /></a></p>
<p>This is a map of Warwickshire County Council&#8217;s election results for the 4th of June 2009, built as part of a mini-project to show off the possibilities of our opendata before our <a href="http://warwickshireopendata.wordpress.com/hack-warwickshire/">Hack Warwickshire</a> competition. (You really should enter you know, you might win an iPad.). </p>
<p>Anyway, for comparison, here&#8217;s the <a href="http://madeofstring.co.uk/electiondata/elections/2">result for 2005</a> &#8211; it&#8217;s easy to see that Labour lost share over the four years. It&#8217;s a strong indication of how Labour generally do well in cities and the Conservatives do better in rural areas &#8211; compare with <a href="http://news.bbc.co.uk/1/shared/election2010/results/region/48.stm">BBC map of 2010 UK General Election results</a>.</p>
<p>I really enjoyed putting this together, despite the stress of not knowing anything about this stuff at the start, and it wasn&#8217;t that difficult in the end.</p>
<p>The outlines of the electoral divisions are available from the Ordnance Survey as part of the recently opened-up <a href="http://www.ordnancesurvey.co.uk/oswebsite/products/boundaryline/techinfo.html">Boundary Line product</a>, and comes as an <a href="http://en.wikipedia.org/wiki/Shapefile">ESRI Shapefile</a>, a popular file format for geographic vector information. </p>
<p>The full dataset is 46MB in size, and contains polygons for all of the county councils in England, so I&#8217;ll need to chuck a load of that data away. Here&#8217;s the full dataset mapped out in open-source GIS tool <a href="http://www.qgis.org/">Quantum GIS</a> &#8211; Warwickshire is the thing in yellow in the middle.</p>
<p><a href="http://madeofstring.co.uk/wp-content/uploads/2010/05/qgisVisualisation.png"><img src="http://madeofstring.co.uk/wp-content/uploads/2010/05/qgisVisualisation-500x520.png" alt="" title="qgisVisualisation.png" width="500" height="520" class="aligncenter size-medium wp-image-156" /></a></p>
<p>I used the command line utilty ogr2ogr from the Geospatial Data Abstraction Library (<a href="http://www.gdal.org/index.html">GDAL</a>), and converted the .SHP file from eastings/northings into latitude/longitude:</p>
<div class="codecolorer-container bash blackboard" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="bash codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">ogr2ogr -s_srs EPSG:<span style="color: #000000;">27700</span> -t_srs EPSG:<span style="color: #000000;">4326</span> destination.shp source.shp</div></div>
<p>&#8230;and then converted the resulting file into a .KML:</p>
<div class="codecolorer-container bash blackboard" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="bash codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">ogr2ogr <span style="color: #660033;">-f</span> <span style="color: #ff0000;">&quot;KML&quot;</span> destination.kml source.shp</div></div>
<p>Each area, called a placemark in the KML, came out of the sausage machine looking something like this, only 108MB of it.</p>
<div class="codecolorer-container xml blackboard" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;height:300px;"><div class="xml codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;Placemark<span style="color: #000000; font-weight: bold;">&gt;</span></span></span><br />
&nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;Style<span style="color: #000000; font-weight: bold;">&gt;</span></span></span><br />
&nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;LineStyle<span style="color: #000000; font-weight: bold;">&gt;</span></span></span><br />
&nbsp; &nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;color<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>ff0000ff<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/color<span style="color: #000000; font-weight: bold;">&gt;</span></span></span><br />
&nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/LineStyle<span style="color: #000000; font-weight: bold;">&gt;</span></span></span> &nbsp;<br />
&nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;PolyStyle<span style="color: #000000; font-weight: bold;">&gt;</span></span></span><br />
&nbsp; &nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;fill<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>0<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/fill<span style="color: #000000; font-weight: bold;">&gt;</span></span></span><br />
&nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/PolyStyle<span style="color: #000000; font-weight: bold;">&gt;</span></span></span><br />
&nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/Style<span style="color: #000000; font-weight: bold;">&gt;</span></span></span><br />
&nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;ExtendedData<span style="color: #000000; font-weight: bold;">&gt;</span></span></span><br />
&nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;SchemaData</span> <span style="color: #000066;">schemaUrl</span>=<span style="color: #ff0000;">&quot;#electoral_division_latlng&quot;</span><span style="color: #000000; font-weight: bold;">&gt;</span></span><br />
&nbsp; &nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;SimpleData</span> <span style="color: #000066;">name</span>=<span style="color: #ff0000;">&quot;FID&quot;</span><span style="color: #000000; font-weight: bold;">&gt;</span></span>1558<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/SimpleData<span style="color: #000000; font-weight: bold;">&gt;</span></span></span><br />
&nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/SchemaData<span style="color: #000000; font-weight: bold;">&gt;</span></span></span><br />
&nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/ExtendedData<span style="color: #000000; font-weight: bold;">&gt;</span></span></span><br />
&nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;Polygon<span style="color: #000000; font-weight: bold;">&gt;</span></span></span><br />
&nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;outerBoundaryIs<span style="color: #000000; font-weight: bold;">&gt;</span></span></span><br />
&nbsp; &nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;LinearRing<span style="color: #000000; font-weight: bold;">&gt;</span></span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp;<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;coordinates<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>-1.667473197956685,52.164132540593961 (and lots more....)<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/coordinates<span style="color: #000000; font-weight: bold;">&gt;</span></span></span><br />
&nbsp; &nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/LinearRing<span style="color: #000000; font-weight: bold;">&gt;</span></span></span><br />
&nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/outerBoundaryIs<span style="color: #000000; font-weight: bold;">&gt;</span></span></span><br />
&nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/Polygon<span style="color: #000000; font-weight: bold;">&gt;</span></span></span><br />
<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/Placemark<span style="color: #000000; font-weight: bold;">&gt;</span></span></span></div></div>
<p>At this point I came to the all-too obvious realisation that the polygons really really were just made out of a shitload of co-ordinates. I knew this would be the case, but I kind of expected it to be something a bit cleverer. I&#8217;m not sure what else I could&#8217;ve been expecting, really.</p>
<p>So all good so far, but I had no idea which electoral division was which. Alongside the .SHP file was a .DBF file (so dBase, right?) &#8211; this can be opened up in Excel.  It took me a while to work it out, but the FID number in the KML corresponded to the row number (which is C in the table below) of the area details. </p>
<p><a href="http://madeofstring.co.uk/wp-content/uploads/2010/05/wccDbfExcel.png"><img src="http://madeofstring.co.uk/wp-content/uploads/2010/05/wccDbfExcel.png" alt="DBF file displayed in Excel" title="wccDbfExcel" width="500" height="247" class="aligncenter size-full wp-image-170" /></a></p>
<p>Once I&#8217;d chopped out all the non-WCC electoral divisions from the KML file, I was left with about 3MB of data. I took this and ran it through a very simple PHP routine (leaning heavily on the SimpleXML library) to load the data into a MySQL database as field type geometry. The original data was accurate to at least 16 decimal places, which is lovely and all, but overkill on Google Maps, so I took it down to six decimal places, which I think is to within 10cm. Which should be enough.</p>
<p>From there I imported the names of the areas, and with a bit of fiddling, matched those up against the names of the electoral divisions I&#8217;d scraped from the original Notes election systems.</p>
<p>I&#8217;d had a warning previously that web browsers were a bit rubbish at dealing with the large amount of data it can take to map areas, and I was a bit concerned when I found that the full map of Warwickshire electoral divisions as supplied by the Ordnance Survey contains over 80000 points. It sounded like a lot. </p>
<p>Some rooting around in Stack Overflow brought up a mention of the Douglas-Peucker algorithm, which is a method for simplifying lines. Developer Anthony Cartmell has written an <a href="http://www.fonant.com/demos/douglas_peucker/algorithm">implementation of the algorithm in PHP</a>, and has a <a href="http://www.fonant.com/demos/douglas_peucker/map">demo of it in action too</a>. I used Anthony&#8217;s class to simplify the electoral division polygons. Here&#8217;s a screenshot showing an example of the line generalisation using Anthony&#8217;s class on the Aston Cantlow electoral division &#8211; I&#8217;ve chosen badly with the colours on the map, the pink line has 813 points, and the yellow line has 74 points.</p>
<p><a href="http://madeofstring.co.uk/wp-content/uploads/2010/05/astonCantlowEDComparison.jpg"><img src="http://madeofstring.co.uk/wp-content/uploads/2010/05/astonCantlowEDComparison-500x518.jpg" alt="Comparison between original and simplified lines, using the Aston Cantlow electoral division as an example" title="astonCantlowEDComparison" width="500" height="518" class="aligncenter size-medium wp-image-176" /></a></p>
<p>I also wrote a routine to export the whole dataset as a simplified KML file &#8211; this is at setting 3000, which results in a 176 KB KML file, and 6647 points (just over 8% the size of the full version):</p>
<p><a href="http://madeofstring.co.uk/wp-content/uploads/2010/05/boundaryLineAt3000.png"><img src="http://madeofstring.co.uk/wp-content/uploads/2010/05/boundaryLineAt3000-500x633.png" alt="Warwickshire Electoral Division boundary line, generalised" title="boundaryLineAt3000" width="500" height="633" class="aligncenter size-medium wp-image-177" /></a></p>
<p>Here&#8217;s a <a href="http://maps.google.com/maps?f=q&#038;source=s_q&#038;hl=en&#038;geocode=&#038;q=http:%2F%2Fmadeofstring.co.uk%2Felectiondata%2Fkml%2Fall-3000.kml&#038;sll=52.405838,-1.512661&#038;sspn=0.221596,0.520477&#038;ie=UTF8&#038;ll=52.32359,-1.41449&#038;spn=0.888029,2.081909&#038;z=9">link to the KML in Google Maps</a>, with 6647 points &#8211; notice on the left that the areas aren&#8217;t labelled yet.</p>
<p>After all that, I did an export of the dataset at 99% of the points in the full version, resulting in a 1.8 MB KML (<a href="http://maps.google.com/maps?f=q&#038;source=s_q&#038;hl=en&#038;geocode=&#038;q=http:%2F%2Fmadeofstring.co.uk%2Felectiondata%2Fkml%2Fall-10000000.kml&#038;sll=52.289373,-1.557655&#038;sspn=0.083476,0.153122&#038;ie=UTF8&#038;ll=52.321071,-1.536713&#038;spn=0.667323,1.224976&#038;z=10">view on Google Map</a>) &#8211; which still works quickly in Mac OS X Safari, and was actually OK in IE7 too. I was interested that IE could cope with this many points &#8211; it made me wonder if something was being done Google server side to smooth out the points at a particular zoom level.</p>
<p>I also wrote a variation on this to output back to the database rather than out as a KML file &#8211; this meant I could quickly experiment with different generalisations of the data to check performance.</p>
<p>Once I&#8217;d finally finished fiddling and settled on a generalisation I was happy with (around 8000 points for the entire set), I built up a single GPolygon using the Google Maps API on the results page map, and coloured it in with the winning party colour:</p>
<p><a href="http://madeofstring.co.uk/electiondata/elections/1/division/170"><img src="http://madeofstring.co.uk/wp-content/uploads/2010/05/alcesterElectoralDivision.png" alt="Alcester Electoral Division" title="alcesterElectoralDivision" width="409" height="204" class="aligncenter size-full wp-image-186" /></a></p>
<p>For the main map it was just a matter of building up all the GPolygons for all the areas, and listeners to add pop-up bubbles when clicking on an area.</p>
<p><a href="http://madeofstring.co.uk/wp-content/uploads/2010/05/electionPopUp.png"><img src="http://madeofstring.co.uk/wp-content/uploads/2010/05/electionPopUp.png" alt="Showing the election pop-up bubble" title="electionPopUp" width="498" height="281" class="aligncenter size-full wp-image-181" /></a></p>
<p>&#8230;and <a href="http://madeofstring.co.uk/electiondata/elections/1">we&#8217;re done</a>. Performance is presumably dependent on Javascript performance &#8211; it&#8217;s definitely fastest in the WebKit-based browsers, as you&#8217;d expect, with Firefox 3 being ok, and IE7/8 being very slow. Chrome was actually usable with the full 80,000 points. </p>
<p>Surprisingly, Opera 10 throws a psychedelic fit, refusing to remove previous shapes as you zoom in, which is pretty but a bit rubbish.  </p>
<p>All I have for mobile testing is my much loved 1st gen iPod Touch, which crawls but does work, including pinch zoom, interestingly. I&#8217;d like to see it on an iPhone 3GS.</p>
<p>Just to compare the experience, here&#8217;s the map running within Chrome 4 on a Windows 7 x64 VM on my Macbook.</p>
<p><object width="480" height="385" class="aligncenter"><param name="movie" value="http://www.youtube.com/v/COuMuOiMXu8&#038;hl=en_GB&#038;fs=1&#038;"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/COuMuOiMXu8&#038;hl=en_GB&#038;fs=1&#038;" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="480" height="385"></embed></object></p>
<p>And here&#8217;s the same map in Internet Explorer 8.</p>
<p><object width="480" height="385" class="aligncenter"><param name="movie" value="http://www.youtube.com/v/KxC-DozGEHQ&#038;hl=en_GB&#038;fs=1&#038;rel=0"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/KxC-DozGEHQ&#038;hl=en_GB&#038;fs=1&#038;rel=0" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="480" height="385"></embed></object></p>
<p>I had high hopes for the <a href="http://ie.microsoft.com/testdrive/">IE9 preview</a>, which is said to boast much better Javascript performance and SVG, but in my testing it didn&#8217;t seem much quicker than IE8. </p>
<p>There are alternatives to building the polygons client-side, but with the experiments I&#8217;m doing right now, I&#8217;m trying to keep things as simple as possible, running outside of the corporate network using my cheap shared host and free web services to get a quick leg-up and show what can be done without spending a fortune. The shapes could be built server-side, which given a big enough server &#8211; however big that is &#8211; would be more usable across a wider range of browsers. This would need infrastructure putting in place, and I&#8217;m too cheap to get myself a VPS or dedicated server at the moment. Also building the polygons client-side also gives you a level of interactivity (&#8230;alright, they&#8217;re clickable) which has further possibilities.</p>
<p>So for now, this method works for simple shapes, but for presenting more complicated outlines to a general audience, I&#8217;m hoping the future will catch up with us and Microsoft release a blindingly quick version of IE9 which somehow automatically replaces all previous versions in a flash. </p>
<p>Yeah, that&#8217;s gonna happen. </p>
<p>But if not, I&#8217;ll get round to looking at GeoServer. I noted that KML files displayed quickly when viewed directly in Google Maps in IE7 &#8211; this could also be something to explore.</p>
<p>(I should say that this page uses Ordnance Survey data © Crown copyright and database right 2010.)</p>
]]></content:encoded>
			<wfw:commentRss>http://madeofstring.co.uk/article/mapping-our-election-data-in-google-maps/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Automatically tagging news stories using OpenCalais</title>
		<link>http://madeofstring.co.uk/article/automatically-tagging-news-opencalais/</link>
		<comments>http://madeofstring.co.uk/article/automatically-tagging-news-opencalais/#comments</comments>
		<pubDate>Mon, 22 Feb 2010 22:07:38 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
				<category><![CDATA[Article]]></category>
		<category><![CDATA[api]]></category>
		<category><![CDATA[googlemaps]]></category>
		<category><![CDATA[mapping]]></category>
		<category><![CDATA[opencalais]]></category>
		<category><![CDATA[tagging]]></category>

		<guid isPermaLink="false">http://madeofstring.co.uk/?p=43</guid>
		<description><![CDATA[I’ve been fiddling with this little tagging experiment, which I’m pretentiously calling the Warwickshire News Mine, for a couple of weeks now. Essentially the plan was to scrape a bunch of news stories from the Warwickshire County Council website, and see if they could be tagged up automatically.

Initially it was just meant to be an [...]]]></description>
			<content:encoded><![CDATA[<p>I’ve been fiddling with this little tagging experiment, which I’m pretentiously calling the <a href="http://madeofstring.co.uk/newsmap">Warwickshire News Mine</a>, for a couple of weeks now. Essentially the plan was to scrape a bunch of news stories from the Warwickshire County Council website, and see if they could be tagged up automatically.</p>
<p><a href="http://madeofstring.co.uk/newsmap"><img src="http://madeofstring.co.uk/wp-content/uploads/2010/02/newsMineScreenshot-499x332.png" alt="Screenshot of Warwickshire News Mine" title="newsMineScreenshot" width="499" height="332" class="alignnone size-medium wp-image-44" /></a></p>
<p>Initially it was just meant to be an excuse to fiddle with the Google Maps API, but I started having a play with the online automatic tagging service <a href="http://www.opencalais.com">OpenCalais</a>, which ended up being the most satisfying thing about it. I&#8217;ve left all the <a href="http://madeofstring.co.uk/newsmap/tagging">tags and types produced by OpenCalais</a> in so you can compare the tags against the content.</p>
<p>OpenCalais is actually pretty good, despite my previous churlish Twitter whinging. It seems a bit petty to pick holes given that it&#8217;s a free service provided kindly by Thomson Reuters, but well, I&#8217;m going to anyway&#8230;</p>
<p>Most of my problems with it are to do with the categorisation of the tags &#8211; for instance it seems to be pretty good at pulling out <a href="http://madeofstring.co.uk/newsmap/entities/person">names</a>, with some exceptions &#8211; for example, <a href="/newsmap/entities/person/lea-marston">Lea Marston</a> and <a href="/newsmap/entities/person/leek-wootton">Leek Wootton</a> are places rather than people, and <a href="http://madeofstring.co.uk/newsmap/entities/city/warwickshire">Warwickshire</a> is tagged as a City rather than a <a href="http://madeofstring.co.uk/newsmap/entities/province-or-state">Province or State</a>, although it correctly works out that <a href="http://madeofstring.co.uk/newsmap/entities/province-or-state/north-warwickshire">North Warwickshire</a> fits into the latter category.</p>
<p>It could do better with working out synonyms &#8211; for instance <a href="http://madeofstring.co.uk/newsmap/entities/industry-term/anti-virus-software">anti-virus software</a> and <a href="http://madeofstring.co.uk/newsmap/entities/industry-term/antivirus-software">antivirus software</a> are the same thing, and I remember seeing a couple of places where the plural and the singular are included as tags.</p>
<p>For some reason I was impressed that it knows that <a href="http://madeofstring.co.uk/newsmap/entities/tv-show/come-dine-with-me">Come Dine With Me</a> is a TV show, and that the communications team write so much about <a href="http://madeofstring.co.uk/newsmap/entities/programming-language">programming languages</a>. The latter is one case where you would possibly post-process the tags found, in this case by chucking them away.</p>
<p>OpenCalais doesn&#8217;t seem so hot on working out a more general keyword behind a story &#8211; the tagging on the story <a href="http://madeofstring.co.uk/newsmap/2010/02/09/civil-partnerships-and-marriage-increase">Civil partnerships and marriage increase</a> didn&#8217;t pull out the words <strong>marriage</strong> or <strong>wedding</strong> as tags.  </p>
<p>(<strong>Update:</strong> See <a href="http://madeofstring.co.uk/article/automatically-tagging-news-opencalais/#comment-3">comment</a> from Tom Tague of OpenCalais for clarification on the way that OC works). </p>
<p><a href="http://madeofstring.co.uk/newsmap/newsstory/2006/03/16/pram-problem-for-alice-as-curtain-prepares-to-rise"><img src="http://madeofstring.co.uk/wp-content/uploads/2010/02/newsMineStory-500x197.png" alt="Screenshot of a story from the News Mine" title="newsMineStory" width="500" height="197" class="alignnone size-medium wp-image-53" /></a></p>
<p>For me the best thing to come out of the tagging was the &#8220;possibly related stories&#8221; sidebar in the news story page, which I added late on. When you open up a news story, it searches the database for the top 5 stories with the most tags matching that story, and mostly this works pretty well &#8211; possibly because of the robotic tagging consistency of OpenCalais.</p>
<p>On the technical front, the site is based on the usual PHP/MySQL combination, and I used the open-source <a href="http://www.codeigniter.com/">CodeIgniter</a> framework, with the <a href="http://simplehtmldom.sourceforge.net/">Simple DOM Parser</a> for scraping the news stories, and Dan Grossman&#8217;s <a href="http://www.dangrossman.info/open-calais-tags/">Open Calais Tags</a> library to send the main body text off to OpenCalais for processing. The elapsed time to process each piece of content was generally about 3-4 seconds, sometimes slightly shorter, sometimes longer (up to around 10 seconds). I had to run the routine several times to get results for all four thousand indexed stories &#8211; there was a memory leak somewhere along the way.</p>
<p>As for the thing that I initially started out to do, that&#8217;s pretty dull really &#8211; I used the Google Maps API to geocode a list of towns and villages in Warwickshire, as well as few other places, and then ran the main text of the stories through a simple regular expression search to tag them up with <a href="http://madeofstring.co.uk/newsmap/places">places</a>.</p>
<p>There&#8217;s lots of improvements that could be made, but in the end it&#8217;s just a throwaway experiment. I&#8217;d like to  improve the places tagging routine, which could be as simple as adding a few more places. The main thing would be to look into some way of fitting the tags around a pre-defined ontology. There&#8217;s no current method to suggest a list of categories and tags for OpenCalais to process content with, so it would be have to be after the results had been received. </p>
]]></content:encoded>
			<wfw:commentRss>http://madeofstring.co.uk/article/automatically-tagging-news-opencalais/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

