karussell's Diary

Units in OpenStreetMap

Posted by karussell on 1 November 2015 in English.

The main question is: should OSM prefer more concise data if possible and move complexity to the editor? Read the full post here

Discussion

Comment from -karlos- on 1 November 2015 at 21:16

God point. “We Europeans” look to the USA units with worryness. But “we map as it is” includes units in a way to. A limmiting sign at a bridge does have a written or implied unit.

The “user” of OSM isn’t only the viewer user, it’s also the mapper. If you move the unit to a viewer option you also need it in the editor.

Are there tags without units? I would assume a local default then. But it schould be a “fixit”. OSM isn’t a technically database. I like IS units; but is an OSM maxwight or others like road-width ever used in calculations?

The best way would be to change the earth use of units to SI ;-)

Comment from karussell on 1 November 2015 at 21:35

Is mapping really the process down to the database? I doubt that. Mapping means modeling the real situation with the tools we have. The tools will evolve and so should the mapping process making the database more concise and the mapping process less complex.

but is an OSM maxwight or others like road-width ever used in calculations?

Yes, sure. E.g. if you want to know if a truck of width X fits through it or not.

The best way would be to change the earth use of units to SI ;-)

It is important to keep local things as they are and make the mapping as convenient for the mapper as possible. But again: why would the database need this complexity if we can handle it in the editor?

The OSM project grows, tools get better and so we should make it possible to freeze certain definitions of tags and check them in the most popular editors as it is currently already done.

Comment from Warin61 on 1 November 2015 at 21:40

Unfortunately not everyone uses SI.
For those that don’t entering the data should be done in the units they are familiar with (and probably used on the local signs too).

The data within the OSM data base should not be changed to all SI. Consider that the local use may not be SI units and that rendering there may need reconversion to nonSI units. Each conversion has errors .. and that may result in strange results … e.g 15 tons - to SI units - back but now 14.99 tons …

So .. until every one uses the same units .. and drives and walks on the same side .. with the same rules .. with the same signage .. with the same language … Leave the OSM entered data in the units used by the mapper.

Comment from karussell on 1 November 2015 at 21:45

Each conversion has errors

That is something that should be easy to solve

And I would be okay with one unit per country. But as stated in the post, it is not that simple and makes automated and human consuming too complex IMO.

Comment from SK53 on 1 November 2015 at 22:13

This is a really old discussion. When I first started contributing to OSM many speed limits in the UK were in metric units (I hesitate to call kph SI): no 30 mph, but 48.##, or 50 because it was easier. Guess what, folk couldn’t be bothered to map speed limits, too complicated.

Maxheight in Britain is usually expressed in both feet and inches & a metric value: but the conversion used involves rounding. I believe there are some older rarely used places with only the older signage. Thus in this case both values should be tagged.

The last point is that over time there is a tendency to not just do the transformation (a speed limit sign to a speed limit on a road section), but also to map the actual original information (the sign itself). I think it was the Finns who first developed this idea. It greatly assists in checking for errors and for validating existing data. If units are converted it becomes much harder for someone to check that such data is correct: and don’t ignore the likelihood that people will use approximations, perhaps to come back later when they’ve found a conversion formula.

As SomeoneElse said in the changeset comment: preprocessing out-of-band values is essential. You have to do it on data coming from databases with reasonably sensible constraints, so its unavoidable on data which is free text. Only today I’ve been processing postcodes and apart from partial values, garbage in the string etc, there have been several telephone numbers. Of course a nice thing is if one can ‘close the loop’: inform the contributor who generated those values in the hope that they will correct/improve them.

In summary: asking for contributors to do more will probably cause them to do less.

Comment from Richard on 1 November 2015 at 22:23

but is an OSM maxwight or others like road-width ever used in calculations?

Yes, sure. E.g. if you want to know if a truck of width X fits through it or not.

On the other hand, if (say) you normalise speed limits from maxspeed=30mph to maxspeed=48.28km/h, that makes it marginally easier to calculate likely vehicle speeds, but breaks visualisations of what the signs actually say. You win some, you lose some.

So rather than adding the complexity to the editor software, where we’re not blessed with a surfeit of developers anyway, add it to the client software.

osm2pgsql, OSRM, and Tilemaker can all solve this sort of problem with Lua tag transformations, and there’s no reason there shouldn’t be a common library between them. (Indeed, a common library would be very useful for hard-to-parse tags such as opening hours.) Maybe it’s time for Graphhopper to catch up?

Comment from karussell on 2 November 2015 at 08:28

Maybe it’s time for Graphhopper to catch up?

Richard, I do not understand why you comment in this harsh tone? I’m pretty sure you didn’t read the blog post. Because it has nothing to do that one cannot parse a string value in Java.

The main point is, that OpenStreetMap is database, and mappers are currently used to see the raw data, but I think it is time to add a convenience layer for certain purposes like the weight, and yes, maybe also for the speed and length values.

Also the suggestion by SK53 where one tags the original value additionally is interesting, maybe even store a link to the real world sign somehow. But again: we should not put this burden on the mappers shoulder and instead make our editors clever and easy to use so that e.g. people just need to click on a sign and it will then store the link to this sign (maybe just a number) and the associated, converted speed value.

Comment from karussell on 2 November 2015 at 08:30

Assume you build an application from scratch, than you put clean, computer readable values in the database and convert to arbitrary other values in the view layer. I know that OSM has a history and one cannot revert this, but I’m arguing for some partial steps towards a more concise DB.

Comment from Richard on 2 November 2015 at 11:00

I’m not meaning to be harsh; I did read the blog post (and it’s a little “harsh” of you to accuse me of not doing so!); and it’s got nothing to do with parsing string values in Java.

The point is that OSM data is sufficiently complex that to do anything other than basic cartography, you have to parse the data programmatically - tags included. It’s the only way to fully parse the complex path tagging schemes, for example, never mind crazy stuff like opening hours.

You do obviously accept this in some areas - for example, Graphhopper builds a routing graph rather than assume that OSM ways will always be split at junctions for the convenience of routers. Tags are no different. You should expect to have to work with them, rather than just parsing them straight out of a format designed for one particular consumer.

I asked you on Twitter yesterday whether Graphhopper could do this and your answer was “wait for the blog post”. Subsequently someone on #osm-gb pointed me to https://github.com/graphhopper/graphhopper/issues/193, which explains that there is no scripting language support yet.

Scripting language support is how OSRM, osm2pgsql and tilemaker cope with instances like this, and I would commend this approach to you. Yes, I guess you could put it in your core Java app, but that seems a very heavyweight solution and mandates one way of parsing the tags for all your users. Right now I could parse maxweight tags trivially using one of those programs, without having to submit a pull request, and without having to use a heavy-duty language such as C++ or Java. It would be nice if Graphhopper offered the same flexibility.

we should not put this burden on the mappers shoulder and instead make our editors clever and easy to use

I think you’re overestimating how many editor developers we have!

Comment from karussell on 2 November 2015 at 11:08

I don’t understand what this has to do with GraphHopper itself :)

which explains that there is no scripting language support yet.

There is no support yet, but you can do this easily in Java.

Yes, I guess you could put it in your core Java app, but that seems a very heavyweight solution

You can easily extend the core app with own custom profiles (currently only in Java). Even the algorithm itself or other parts can be replaced and customized and more flexible than what OSRM allows you to do.

I think you’re overestimating how many editor developers we have!

You mean, I underestimate this :) ? Yes, probably :)

I think, if OSM does not want to slow down due to increased complexity it has to make this move towards a cleaner database AND at the same time make mapping easier or at least not more complex. Which means editors have to solve this, yes. And pushing such a feature to the only a few editors JOSM and iD at the beginning would certainly help.

Comment from SK53 on 2 November 2015 at 13:06

The place for a cleaner database is not the main OSM API DB which mappers contribute to, but as a separate database where tag values can be normalised, additional values added (for instance higher-level categories, perhaps multiple versions of ways with different degrees of simplification etc. At the moment all these things happen but in application specific databases (such as the osm2pgsql format & numerous routing formats including your own),

I think one or two people have tried this (e.g., OpenCageData), but as a community type activity it would run into precisely the issues the current data has: who selects the values & transforms. Normalisation of data structures is even worse: many things tagged in OSM look quite simple on the surface, but often turn into really complex data modelling issues if one wants to formally capture all the nuances of what is tagged (I have a 30+ entity model to encompass what gets added to post boxes & I haven’t even added anything about collection times which are at minimum a similar order of complexity again).

I also don’t think we’re ready for it either. Your own example probably demonstrates that the number of height & weight restrictions actually mapped in the US is a drop in the ocean compared with those that exist. Only when significant numbers are mapped does the tagging tend to coalesce around more consistent values. My best guess this is occurs when around 5-10% of actual things have been mapped.

BTW: I would love to be able to tweak Graphhopper without having to code Java, in exactly the sort of ways RIchardF describes. I also appreciate that GH may have the same issues with respect to number of developers as OSM editors.

Comment from karussell on 2 November 2015 at 13:27

who selects the values & transforms

IMO: the editor software should enforce this :)

E.g. I’ve seen many places where the autocomplete of ‘maxw’ used maxweight instead of the intended maxwidth

but often turn into really complex data modelling issues

That is exactly the reason why I would like to see a start with this with the tiny unit conversion problem

BTW: I would love to be able to tweak Graphhopper without having to code Java in exactly the sort of ways RIchardF describes

surely this will come ‘officially’ and you can already do with some JVM scripting language…

I also appreciate that GH may have the same issues with respect to number of developers as OSM editors

Comment from karussell on 2 November 2015 at 13:28

I meant:

surely this will come ‘officially’ and you can already do this with all JVM scripting languages like jython, javascript, …

Comment from Wynndale on 2 November 2015 at 19:11

Let me tell you a little story.

My day job is a computer programmer writing finite element analysis software. One upon a time we stored everything in user-specified units; sometimes they were just expected to be consistent, more often the calculations added scaling factors as needed. Originally these were just powers of ten (mm, kN and so on) but a wider variety of units went in later specifically for US use. As the number of units that could be presented to users increased it became more difficult to validate the calculations; although not so many people used some units, calculating the wrong answer is undesirable in our work.

Eventually there was a big rewrite that meant all of the calculations would have to be debugged again and, just before I joined, the calculation expert got the whole team behind him saying that in the future all internal storage and calculations would be done in SI units and existing data sets would be converted. Our biggest problem with that since then was an American dataset that claimed to have metric measurements but got the conversion factor wrong.

Comment from Wynndale on 2 November 2015 at 19:11

Let me tell you a little story.

Comment from Sanderd17 on 2 November 2015 at 19:23

It’s easy to unify measurements when you get the units. It’s hard to estimate the units when you get the unified measurements. If an editor wants to display the originals, but only gets the SI units, then it will have a tough job (and quite likely show mistakes in some cases).

So in a typical case of being lazy, the easiest method should be implemented.

Also note that it’s not only about the two main editors, there are also numerous other apps that display and allow edits to tags. Think of the general OSM site, vespucci, and even apps like OsmAnd.

I do agree with SK53. It would be nice to have a standardised database (or maybe just a program to create such a DB). It would allow complex things, s.a. having defaults per country. (lit=yes should really be a default for Belgian roads f.e., around 98% of our roads are lit). But this doesn’t belong in the main database.

Comment from butrus_butrus on 2 November 2015 at 20:37

It’s disturbing that in 2015 anyone uses anything other than SI… :-(

Comment from Nakaner on 2 November 2015 at 22:33

As long as the OSM data contains a unit if it is not an SI unit, it is no problem. You just have to look if there a non-digit characters at the end of the value string. If yes, you check which unit the mapper used and use the correct conversation factor. If no, you can just treat the number as a metric (default) unit and convert it into a float or integer.

There could only be one huge problem: If the default unit (the unit without a unit at the end of the value) varies from region to region. If width=6 were feet in UK and metres in France, it would be a problem. But luckily, UK mappers are good mappers and add mph if the use miles per hour.

If am writing a data conversation tool from OSM data into another very common data format at work. I just had a look how those non-SI countries tag maxspeed and width and choose which units I will support.

You will also get this problem if you have a look at the common maxspeed=* values. Does Graphhopper support maxspeed=RO:urban? My tool does.

Comment from karussell on 3 November 2015 at 07:32

Thanks Wynndale&Sanderd17 for the thoughts :)

I think when we store the unit e.g. in note:maxweight we won’t have this problem

Maybe the biggest problem I have is the uncertainty of 3 different units in the U.S. ..

So in a typical case of being lazy, the easiest method should be implemented.

I don’t agree. This was probably okay in the last 10 years but will become a big headache in the future…

It would allow complex things, s.a. having defaults per country. … But this doesn’t belong in the main database.

Why does it not belong in the main DB??

Does Graphhopper support maxspeed=RO:urban? My tool does.

Again, this discussion has nothing to do with GraphHopper, except that I’m the author and uncovered this problem through working with GH ;) In Java I can do pretty anything I like and I do for maxspeed, it was okayish and not really error-prone. But for weight units it is not the case due to the ambiguity of ‘tons’.

And all can be solved if one defines one unit per tag and per country. Or my preferred solution would be to have just one unit and solve the conversion via editors. Of course we have many editors but we have thousands of consuming software which then need to do the error-prone job.

Comment from karussell on 3 November 2015 at 07:33

BTW: Thanks Nakaner & the others for the toughts too :)

Comment from Tordanik on 5 November 2015 at 16:21

Personally, I try to always add units to avoid any ambiguity. Sure, defaulting to SI units makes sense. But when some keys want meters as the default (e.g. maxheight), others want kilometers (e.g. distance), and again others want centimeters (e.g. the proposed step length), then mistakes will happen. Always adding the unit is easy and fixes that issue.

Comment from Tordanik on 5 November 2015 at 16:21

karussell's Diary

Units in OpenStreetMap

Discussion

Log in to leave a comment