Railway Crossings challenge for MapRoulette

Posted by MikeN on 30 October 2016 in English.

One class of quality improvements in the US is Railway Crossings. The original TIGER import mostly connected railways to highways or crossed with a duplicate but unconnected road. There was no bridge or crossing information to assist.

It can be useful to have railway crossing information available for navigation. In 2015, MapRoulette defined a challenge to review all railway crossings. The first version of the Railway Crossings challenge at MapRoulette used the points defined by the US Federal Railway Authority. Many of these crossings had already been corrected by map editors during normal QA. The challenge began with 120K crossings to review. This was reduced to some 70K points by the time MapRoulette V2 came out. The partially completed challenge was not migrated because the MapRoulette V2 features were being tested and improved. Although this is not the ideal type of task for MapRoulette, I enjoyed being able to knock out 5 in a row without much effort (unless in KY, PA or WV!) It also is ideal for an armchair challenge - only a few are difficult to make out from the air.

Because I would typically correct nearby crossings when fixing a task (others may have also), I wondered if identifying remaining crossings with a topological analysis would result in fewer false positives, and fewer already-completed tasks to review. So I set up a POSTGIS instance and tried to construct queries that would identify problem crossings. That proved to be too difficult:

I couldn’t tell if OSM2PGSQL or Osmosis populate the database with the full OSM topology, including where nodes are shared, and even if so, what would such queries look like.
I only had 8G of RAM, and didn’t know how to construct queries for POSTGIS that would handle US-sized data in a reasonable time.

In the end, I wrote a program in C# to analyze railway crossings. I filtered the raw OSM data in Osmosis so that my program only needed to deal with Highway VS Railway. As I looked at some results, I realized that I could also Quality Check pedestrian-railway crossings with the same program, and create another challenge.

As I was looking at the first analysis, I found many bridges without a layer tag. Some would say that bridges imply a nonzero layer, but it is still better to specify. For this challenge however, I excluded all bridges, tunnels, and ways with a layer attribute. My thinking is that those locations do not have a typical railroad crossing, and someone has already done some review there. And there are other OSM QA tools that already address bridges with missing layer tags.

I also exclude these railway types: abandoned, razed, station, disused, dismantled, demolished, adjacent, platform . Although a ‘disused’ railway may cross a road, I saw too many of these with no X painted on the road and could not identify that rails are even present. Often they would require local or RailFan knowledge to be accurate.

When railways intersect a roadway and share a node, I check for a railway=level_crossing node tag. When railways intersect a sidewalk, path, or cycleway, I check for a railway=crossing tag. Many highway crossings are marked as a pedestrian crossing because the mapper’s natural choice is ‘this is a railway crossing’, therefore railway=crossing.

Because the OSM data is the starting reference, no crossings will be flagged where driveways cross a railway, but no driveway exists in OSM.

The links to these challenges are:

[Crossing Ways: Highway-Railway, US] http://maproulette.org/map/980

[Crossing Ways: Pedestrian-Railway, US] http://maproulette.org/map/989

[Crossing Type: Highway-Railway, US] http://maproulette.org/map/990

[Crossing Type: Pedestrian-Railway, US] http://maproulette.org/map/991

Some C# problems I encountered (with the Microsoft .NET library):

“640K of memory ought to be enough for anyone” Out of memory! What?! It turns out that the default build properties are set to “Prefer 32-bit”. Unchecking that option gives a larger memory option. Be sure to do that for both Debug and Release!

Rerun - it gets further, but now “The dimension of the array exceeds the limits of addressing.”

“47995853 nodes ought to be enough for anyone”

The next problem encountered was the discovery that the default hash implementation supports only 47,995,853 objects before giving an out of memory error. Fortunately the error is easy to work around. In the application configuration file, configure the runtime to support very large objects with the gcAllowVeryLargeObjects tag: <configuration> <startup> <supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.6.1"/> </startup> <runtime> <gcAllowVeryLargeObjects enabled="true" /> </runtime> </configuration>

Discussion

Comment from SK53 on 31 October 2016 at 10:35

With respect to Postgis queries: gridding the data is always a good strategy for reducing in-memory processing requirements. For analysing European & US road networks I’ve experimented with grid sizes anywhere from 7.5 minutes to 10 degrees. I would imagine for this task you’d probably be fine with something of the order of 1 degree.

Another route for your data would have been Overpass querying for highways sharing a node with a railway. Max Erickson is usually my goto guy for how to do this kind of thing.

If you used Osmosis to populate a snapshot schema then you just need a plain SQL query on using ways, and way_nodes. You want all nodes in the way_node table which belong to at least 2 ways, something along the lines of the following (untested) (SELECT DISTINCT node_id FROM way_nodes wn JOIN ways w ON wn.way_id = w.id WHERE (w.tags?’highway OR w.tags?’railway’) GROUP BY node_id HAVING COUNT(DISTINCT way_id) > 1) and COUNT(DISTINCT COALESCE(w.tags->’highway’, w.tags->’railway’) > 1). I’d actually do this as separate queries to reduce the number of table scans on the big initial hit on way_nodes.

Comment from MikeN on 1 November 2016 at 01:36

Thanks for the tip about Max as an expert! Is there a good tutorial on Geo-SQL / POSTGIS? I have found that standard SQL skills don’t carry over to Geo-SQL.

Comment from mmd on 1 November 2016 at 16:08

Another route for your data would have been Overpass querying for highways sharing a node with a railway.

That’s fairly easy to find out, actually: http://overpass-turbo.eu/s/jLw

Getting those missing intersections is a bit trickier: http://overpass-turbo.eu/s/jLx - showing railways in blue color and highways in red. There’s some glitch in the query I left as an exercise to the interested reader. Make sure to zoom in quite a bit, otherwise highways and railways cancel out each other in some cases.

Comment from mmd on 1 November 2016 at 16:21

Forgot to mention that there are already some QA tools out there reporting railway crossings without tag. As an example here’s a link to keep right! As the source is freely available you might take a look how their analysis looks like.

Comment from Rovastar on 1 November 2016 at 22:57

It is to do with hitting a 2GB default limit rather than a specfic number.

Not very often I can talk about asp.net issues on OSM.

OpenStreetMap

MikeN's Diary

Railway Crossings challenge for MapRoulette

Discussion

Leave a comment