Saturday, October 27, 2012

Maps mash up: Underground vs post office

The other day, when I had to post an important letter, I was wondering if there is any post office close to an underground station on my commute, and I could not come up with one. So - me being me after all - I decided to do some in-depth analysis on this. And learn Python along the way.

1.  The Idea
Get all the underground stations and post offices as geo coordinates, and find those closest to each other.


2. Getting the underground data
Easy, the geo data of all the public transport stations (or ony the underground stations) can be found easily, since the are part of the open government data, shared by the municipality of Vienna.

3. Going postal
Getting the post office data is rather challenging, because they are seemingly not considered to be public enough.

Still, this can be done; so first get a list of all the post offices in Vienna from post.at. Best with a little python script (my first!!) to parse it.
Get all those with a ZIP code starting with '1' into a CSV file including their full street address.

Then - thanks for the hint, martin - use the Yahoo! PlaceFinder API to convert those postal addresses to geo coordinates. Get an Appid for Yahoo!, if you don't yet have one.

Important trick here: Don't place everything into the q=... query string, but separate it into postal, city, street, etc. Like this
u = yahooURL+"?appid="+yahooAppidy
u += "&postal="+po.zip
u += "&city="+po.city
u += "&street="+quote_plus(unicodedata.normalize('NFKD', po.addr).encode('ascii','ignore'))
u += "&country=Austria"
From Y! we get some fine XML back and use XPath to access the geo coordinates

doc = ElementTree(file=urlopen(u))
lat = doc.findall('.//Result/latitude')[0].text
lon = doc.findall('.//Result/longitude')[0].text
So now we have the geo coordinates of all the underground stations and all the post offices.

4. Visualize & Verify the data
Let's again use Python to create a simple KML file to load the date acquired so far into Google Maps (or Google Earth). There's a Simple KML python library to do just that. Here's are the maps (post offices, underground stations)

5. Do the Geo Math
This is the tricky part. I decided to put all the data into a (relational) database, and since I have DB2 installed on my system, it was of course DB2. I did not use any geo/spatial extension, but just put the latitude and longitude into proper types. Then I created a user defined function (UDF) to do the geo math. Actually, for this purpose this could have been simpler, since one can disregard all spherical aspects and assume the surface (of the Earth in Vienna) to be flat.
With my newly created haversine function the query then looks like this:

select haversine(p.lat,p.lon, u.lat,u.lon) as distance, p.plz,p.street,p.lat,p.lon,u.station,u.lat,u.lon
from ubahn.post p, ubahn.stationen u
where haversine(p.lat,p.lon, u.lat,u.lon) <1
order by 1 asc
fetch first 20 rows only

I'm only interested in post offices that are maximum 1km (distance<1) from an underground, and I only want the closest 20 of those (first 20 rows only)

6. Create a map
Export those data into a CSV file, and run a pyhton script that creates the KML file for this, with the post office being a point, the distance being a line, and the underground again a point. Map can be found here.

Turns out, there are more pairs than I actually thought.

2 comments:

MiSter said...

Very interesting but at least one pair is missing
U1/2 Praterstern
Filiale, 1022 Wien, Praterstraße 68 Stiege 1

and maybe
U1/2 Praterstern
Filiale, 1020 Wien, Weintraubengasse 22
Don't know if distance<1 fits here

Roman said...

both <1km, but not within the first 20 :)

full table is here https://dl.dropbox.com/u/15408640/distanz.csv