OpenStreetMap

Counting the use of a specific tag in your osm2pgsql databases

Posted by skorasaurus on 15 September 2013 in English (English)

Earlier this week, I was talking with a friend who just moved from Massachusetts to Cleveland, they were a bit surprised about the use of middle school and junior high were both used to describe schools consisting of Grades 6-8.

I was curious about this myself and wondered which was used more often in Ohio and this general outline that I use for counting the use of middle school and junior high in Ohio can be applied if you want to see which tag is used more often in a specific state, country, or other place.

Since I was just looking for a state wide comparision, I couldn't use the USA implementation of taginfo which uses the entire USA.

So, for my state of ohio, I figured this answer out by: 1. downloading an extract of my state from geofabrik.

  1. Create the postgis database for it (I have ubuntu 12.04, postgis 2.0, postgresql 9.1, assume you already have this installed; commands may be slightly different - I should spell this step out more clearly in a different post since I never found good documentation for this when I first starting learning this back in '11.): createdb nameofyourdatabase

psql -d nameofyourdatabase -c "CREATE EXTENSION postgis;" 3. Creating an osm2pgsql databse with it. osm2pgsql -s -d nameofyourdb ~/path/to/data.osm.pbf

  1. Using pgadmin, with the gui, I connected to my database and I clicked on magnifying glass with SQL within it and entered the following SQL. input the following SQL:

SELECT name from planet_osm_polygon WHERE lower(name) ~ 'junior high' UNION ALL select name from planet_osm_point WHERE lower(name) ~ 'junior high' ORDER BY name

(thanks to Paul Norman to assist me with the proper SQL query syntax).

If you are interested in generated statistics of tags or creating maps using postgresql in OSM data, most of your interaction with postgresql will be creating queries and using SELECT statements.

SELECT name from planet_osm_polygon WHERE lower(name) ~ 'junior high'
this means is that I selected all closed ways that has the name 'junior high' in it (case insensitive).

Now here's what the finer points of what syntax means:
* name - This is the column 'name'. Now, osm2pgsql creates columns based on the first half, also called the 'key', of an osm tag. Osm tags are written out as key=value

Other columns generated by your osm2pgsql include highway, amenity, leisure, and many more. Because it's highly unlikely that there's a store named junior high and the sake of simplicity, I didn't need to specify that a tag must have amenity=school and have junior high in its name.

  • planet_osm_polygon - This is the name of a table in osm2pgsql that contains all closed ways. Here's the names of the other tables in osm2pgsql.

  • WHERE - specifies the condition in which I want to select the name table. If I wanted to query a simple tag that has a standard key and value, like amenity=parking ; I could simplify do WHERE amenity IN ('parking') But since 'junior high' occurs in the middle of a text phrase, the tilde (known as ~) will search for the pattern 'junior high' within the tag value.

So, this will return results for: name=Mooney Junior High School; name=Junior High School; name=wilkens junior high ; regardless of case sensitivity. . (I admit I don't fully understand this aspect of the syntax, so someone clarify if I'm incorrect !)

  • UNION ALL - this allows you to do multiple queries within one and include the results of both queries at once .

select name from planet_osm_point WHERE lower(name) ~ 'junior high' ORDER BY name

Because OSM objects can be tagged as either nodes or as ways, I need to also search for any nodes that have junior high in its name ! The name of the nodes' table is planet_osm_point and the structure of the syntax is nearly the same.

  • ORDER BY - this is simple, it merely sorts the results by a column. In this instance, I want to sort them in alphabetical order, so I did ORDER BY name.

Now, we can execute our query by clicking "Execute query" (its icon looks like a play button on a DVD/VCR),

Now in pgadmin's lower-right hand corner, will be the number of results that are returned and the names of all of the schools with junior high in it...

So, we see: 219 of Junior High in Ohio ! There's a few duplicate ones, which is interesting. Some may be the same name in 2 different places, some may be duplicate nodes of the same school.

And we repeat the process again for middle school, and... 389 !

Middle school is used more often in Ohio than in Junior High... :)

Comment from Minh Nguyen on 15 September 2013 at 21:49

Note that TIGER’s coverage of Ohio schools is horribly out of date, even after you filter out all the “(historical)” POIs. For example, cities like Bellefontaine and London have replaced all of their schools recently, while others like Hamilton and Springfield have consolidated quite a bit. Also, I’ve mapped scores of school buildings with courtyards as multipolygon relations. Not sure if planet_osm_polygon would catch those cases.

Still, I’ve always had the impression that the junior/senior high school distinction is a bit older than the middle/high school distinction. In Southwest Ohio and Northern Kentucky, school districts typically build new schools with the following naming schemes: Elementary+High, or Elementary+Middle+High, or Primary+Intermediate+Middle+High, or Elementary+Intermediate+Middle+High.

Hide this comment

Leave a comment

Parsed with Markdown

  • Headings

    # Heading
    ## Subheading

  • Unordered list

    * First item
    * Second item

  • Ordered list

    1. First item
    2. Second item

  • Link

    [Text](URL)
  • Image

    ![Alt text](URL)

Login to leave a comment