Andy Allan's Diary Comments

Diary Comments added by Andy Allan

Post When Comment
OpenStreetMap Isn't Unicode 18 days ago

U+2B66E can be stored in various ways, depending on the encoding. From the unicodedecode page, the hex representations are:

UTF-8 HEX Value     0xF0AB99AE
UTF-16 HEX Value    0xD86DDE6E

So it seems to me that at some point in your processing chain, something has taken the original UTF-8 from OSM, converted it to UTF-16 in memory, and then something else is reading that same hex value from memory, is unaware of surrogate pairs and is treating “0xD86DDE6E” as the two Unicode characters U+D86D and U+DE6E - which is completely incorrect, since they are both invalid Unicode codepoints. (All codepoints from U+D800 to U+DFFF are defined as completely invalid in any encoding).

But it’s an understandable software error, since for characters that are just one byte in UTF-16, you can take the UTF-16 encoded character and that’s the same value as the corresponding Unicode codepoint e.g. U+9B5A is represented as 0x9B5A in UTF-16.

The error is thrown when the next step tries to write out that sequence of Unicode codepoints in an UTF-8 encoding, since it gets to U+D86D and knows that that is an invalid character and throws the error.

I think, since we’ve again shown that there’s nothing wrong with the UTF-8 stored in the database, and there’s nothing wrong with the UTF-8 in the API / cgimap / diffs / planets, that we’ve gone a long way off topic for this diary entry. Perhaps any further troubleshooting of your toolchain can be carried out on the mailing lists, forum, chat or elsewhere?

OpenStreetMap Isn't Unicode 18 days ago

The key thing in the name that @mmd selected the first character. . It’s decimal value is too big (177,774) to fit in a 16-bit integer (65,536). It’s rare, but not unheard of, for OSM to use Unicode characters from outside the Basic Multilingual Plane (the first 65,536 characters in Unicode).

In my experience, particularly with Windows, a lot of internal representations of Unicode characters are not stored in memory as UTF-8, but as UTF-16 which consists of either one 16-bit character, or for these rarer characters it’s stored as two 16-bit characters using a technique called surrogate pairs. The wikipedia page on UTF-16 says:

Because the most commonly used characters are all in the BMP, handling of surrogate pairs is often not thoroughly tested. This leads to persistent bugs and potential security holes, even in popular and well-reviewed application software.

I suspect that one of the pieces of your software chain is representing the characters in UTF-16, and not handling surrogate pairs properly, and is throwing an error when given one of these rarer characters.

I hope this helps!

OpenStreetMap Isn't Unicode 3 months ago

Thanks bdon for this blog post. I’d heard before these rumours that OpenStreetMap has Unicode / UTF-8 problems, but I couldn’t find anywhere that gave enough details for me to figure out what was really going on.

Of course, as mmd has explained, there’s not actually any Unicode or UTF-8 problems in our API or data dumps. It’s just that some sequences of valid unicode characters don’t make much sense, and unfortunately, there’s an alternative way of representing text in Burmese that happens to use some of the same code points, so there’s potential for a bit of a mess. But from bryceco’s analysis, it looks like it’s getting much better as time goes on.

I’m happy to see the efforts going into cleaning up the garbled text, and perhaps this is something that could be detected and flagged up by QA tools too?

My would-be answers to the OSMF board survey about 1 year ago

Personally I think the survey stinks. The questions are mostly two-topics-in-one and as you describe, the answers will be meaningless since they can be interpreted however the Board prefers. And the phrasing of the questions are so leading (towards supporting decisions already made) that the entire exercise becomes pointless.

The question about large donations to fund things is my prime example. “Do you support (subtly complex topic) in order to do (something that sounds great)”? Of course! We all support something that sounds great! Now the Board points at the survey and say that everyone supports making the OSMF dependent on large donations from corporate interests. Job done.

Asking questions about complex topics (like whether the OSMF should centralise software development funding, or the reliance on large vs individual donors) without providing any context or background is risible.

Sustainable Travel Expenses Resolution – Request for Support about 1 year ago

I support putting this to the AGM.

Thoughts on the how and where of the OSMF starting to hand out money in the OSM community over 1 year ago

I think it’s worth distinguishing between software that is on the critical path for data (mapper -> editing -> API -> database -> planet and replication), and then everything else. If you remove something on the critical path then OSM stops working. But if you remove something else, such as tileservers or search or routing, then sure mappers will notice but they can still put data in and get data back out again.

That’s how I used to distinguish software and systems when I was on OWG. We had three tiers - the core infrastructure required for the editing API and data distribution (and everything else involved in that, like NFS and DNS and whatnot), then tier two was stuff that would be really impactful for mappers if it wasn’t working (like tileservers and nominatim and the wiki) and then the tier three stuff was things that most mappers wouldn’t even notice if it was missing. Obviously there’s room for debate about this!

OSM the Legal Monster almost 2 years ago

You make a good point, but then you over-egg the pudding by including unrelated things like the banner policy and the new tile layer policy. I would suggest removing that entire paragraph, your point is stronger without it.

The moderation queue. The first 3000 issues almost 2 years ago

Thanks for reporting on these statistics! It’s nice to see the feature is proving useful, but of course it would be even nicer if it was unnecessary.

Based on your analysis, what changes to the issues and reports would be the most useful?

OWG Must Be Destroyed about 2 years ago

Just to correct a few points - OWG does have policies, including how to join OWG (and the sysadmins group) - see Of course they aren’t perfect or comprehensive but it’s a start. I’m particularly proud of getting agreement for the joining policies because it used to be completely opaque as to what was required.

Also, it’s incorrect to say that nobody has joined OWG since 2011, since Paul Norman joined last year and is active today. I’m not sure if you also meant to exclude Sarah Hoffmann, who was a member of OWG for several years from 2013 to 2017 and is also an active sysadmin today. We also had two probationary members in the last two years, but they didn’t become full voting members.

Sure, I’d rather see 20 other people on the list, and 5 new people every year. But it’s still worth painting an accurate picture.

As for the website and API, developing those aren’t a matter for OWG. Tom does both, but for example I’m no longer on OWG precisely to focus more on the website development. I feel like the development has improved a lot in the last 3 years, but it’s about 5 on a scale of 0 to 100 where 100 is what I’d like to see. And I’ve been on the receiving end of plenty of curt PR comments. It’s not a great experience.

You ask me “Does it really feel okay to you” and of course, no, none of this is good. But I want to work with everyone who is interested in making these situations better, and I want to discourage people from expressing their frustration in a manner that makes things worse.

OWG Must Be Destroyed about 2 years ago

I’m not sure that I even want to reply to this, given the (presumably deliberately) outrageous title. Perhaps by responding I’m just encouraging more posts like this in the future? I hope not.

“Destroying” or OWG makes no sense, since it’s there to solve a legitimate purpose. If everyone on it disappeared tomorrow, OSMF would still need those purposes taken care of. The server budgets needs writing. The resource usage (like database disk space) needs forecasting and hardware needs planning. If you really want to “destroy” OWG, then you would need to explain what that stuff is no longer necessary, or suggest which working group should be doing that stuff instead.

You’ve also glossed over whether you want to destroy the sysadmin group, or OWG, or both, but no matter.

So let’s focus on the (slightly) more sensible suggestion, which is to “disband” the group and start again from scratch. Would that really work? Would the incoming people have any idea what needs to be done? Perhaps. Over the last few years we made the hardware site, and wrote a lot of monthly reports, and the chef-repo exists, so perhaps the new people could read all of them and try to get started. But it’s a high-risk strategy to kick everyone off first, instead of adding new people in and keeping those with experience still around to answer questions. I guess the sensible approach wouldn’t generate the dramatic headlines though.

But in any case, diary entries like this can become a damaging self-fulfilling prophecy. What member of the community wants to get involved, if the only public attention you get is posts like this? Why would any sensible person join any working group, if prominent and well-respected former OSMF Board members write posts like this? Try to imagine what being on the receiving end of this would be like. Try to realise what you are doing here, and why there will be even fewer working group members (or even board candidates) in future. You’re creating an environment where even paid contractors won’t want to get involved.

There’s definitely problems in this working group, and they definitely need fixing. But this kind of post does more harm than good.

For anyone who is reading this and wants a more practical set of suggestions from a former OWG member, feel free to read for some ideas, or if you want any other suggestions for improving OWG feel free to ask me any of your questions directly.

The OSM community deserves a better about 2 years ago

It’s great to see discussion like this, and I share a lot of your feelings about the site. To say there’s room for improvement would be an understatement! I particularly note the lack of user search, and the rest of the underdeveloped ‘community’ parts of the site. I think it is illustrative that we have the ability to “browse relations” and “node history” but no way to find other community members. (I prefer thinking in terms of ‘community support features’ rather than ‘social networking’ since that term has a lot of negative connotations).

However, there’s no shortage of idea or wishes to improve what we have. What we are really missing are the people who are willing and able to do the coding, design, and other development work. Whether those people are volunteers or paid for doesn’t matter to me, but at the moment we just don’t have enough people involved. So progress is slow.

I’ve spent the last three years relentlessly working on making it easier to contribute to the development. You can read more on my personal blog if you are interested. My own todo list will keep me busy for at least the next 5 years, never mind all the big ideas that are out there.

So I encourage you to get involved in the issue tracker to get more familiar with what’s going on, and get familiar with what will be needed to make a impact on our progress. And again, I think it’s the lack of contributors, not the lack of ideas, that is the most important thing here.

Reflections on OSMF about 2 years ago

You mentioned a Community Map of ‘channels’ which i am not aware of - could you provide a link?

I suspect the map is , which is built using

How to highlight high-precision GPX traces? over 2 years ago

All OSM data is in WGS84. There are no local datums in OSM.

Turned off to make you aware of the Directive on Copyright in the Digital Single Market protests … wtf? almost 3 years ago

You might not like it, but EU laws do affect OpenStreetMap. It would be nice to think that you are unaffected since you aren’t from the EU, but the servers are in the EU, the OSMF is based in the EU, the license is based on EU laws and so on. So if you like OpenStreetMap, then you have an interest in these EU laws, whether you’d like to or not.

Of course, no users of OSM, whether in the EU or not, can directly change these proposed laws. But they can contact their representatives, and this is what this action is designed to encourage. As a Canadian you also have a representative to the EU - you could contact them to explain the problem too.

Are we still English? almost 3 years ago

The iD editor uses locale information from your OSM account settings. It treats ‘en’ as ‘en-US’ by default, so you need to specify ‘en-GB’ to get the British translations.

If you have anything set in the Preferred Languages setting in your OpenStreetMap account, then check that carefully to make sure that ‘en-GB’ appears, and that it’s before ‘en’ or ‘en-US’. If you remove everything from that field, it will be re-populated with the language preferences that your browser sends (and the same rules apply about putting ‘en-GB’ first).

You mention ‘en-UK’ - I’m not sure if that’s a mistake or not, but it’s not the right thing - you need ‘en-GB’.

I’d never noticed until now, but my browser was sending ‘Accept-Language: en-GB,en;q=0.8,fr;q=0.5,de;q=0.3’ (which is approximately correct) but I had just ‘en’ in my OSM account settings from goodness knows how long ago. So I blanked that setting out, pressed save, and now my settings are ‘en-GB en fr de’ and iD now shows the British translations. Which is great!

I-285 NOT a tunnel where the Hartsfield-Jackson runway bridge goes over almost 3 years ago

@Warin61 I think you are being too strict on your definitions, and as @althio says, you seem to be a bit selective in your reading.

However, I’d like to hear your opinion on since there is no earth on either side, or on top, and it was not built by boring or mining through the ground. I think many people would consider this to be a tunnel, do you?

I-285 NOT a tunnel where the Hartsfield-Jackson runway bridge goes over about 3 years ago

Personally, I think it’s reasonable to tag these as four parallel tunnels. From what I see, I would have mapped them as tunnels because each of the lower sections are much, much longer than they are wide, and that’s one of my rules of thumb for ambiguous situations. The first and fourth tunnels indeed have dirt beside them. Having a tunnel wall between two tunnels doesn’t stop them from being a tunnel, so the central pair could be considered tunnels too, even without dirt beside them.

It doesn’t really matter how they were built. Think about all the subway tunnels in London (and elsewhere) that have been constructed by digging a trench and building a platform (or really extensive ‘bridge’) over the top to take roads, parks, buildings etc. The construction methods don’t really matter, they are still considered tunnels.

But most importantly, it’s better to discuss the situation with other mappers, and come to an agreement! Perhaps this is one situation where it’s fine to have both tags? You could talk through it on one of the mailing lists or at a local meetup and see what other people think.

The moderation queue. The first 1000 issues about 3 years ago

Thanks mavl for posting these statistics! It’s great to see that the system is being well used - it certainly took a lot of development work, by a lot of different people, in order to get it fully working and deployed.

Rob - there’s already spam detection and filtering in the website, but there’s always room for improvement. Perhaps these reports and issues can be used to help improve the filters? If anyone is interested in doing this, then they could contact the DWG, who currently handle most of the moderation tasks.

When OpenStreetMap met Mapbox-GL : 🍚IDLY-GL over 3 years ago

@Andy Allan: as you’re mentioning this point: the pull request to remove those remaining slow parts in the map call is already out there, waiting to be reviewed, merged and deployed:

That PR is unrelated to what I’ve described above. Sure, it speeds the map call up, but it doesn’t make it any more cacheable that now.

When OpenStreetMap met Mapbox-GL : 🍚IDLY-GL over 3 years ago

@Komяpa Of course it would be great if we could support unlimited requests on the /map call. But since that’s not the case today then I think it’s nicer to give an early warning rather than a potential bigger problem later on!

As I’m sure you’re aware, but for the benefit of others who might be interested, the /map API call is hard to scale for two reasons. Firstly, it works like a WMS service with arbitrary extents, rather than like tiles with fixed extents. Secondly, it’s crucial for all editing software that it provides read-after-write consistency, so that when a changeset is saved, the next /map request is guaranteed to contain that fresh data. So both these reasons make it hard to cache responses and without any caching it’s hard to scale.

There have been proposals to fix both of these over the last few years. This would be by making map calls tiled, and by providing a “not-before-changeset” parameter or similar for the consistency issue. Then we could cache responses without breaking the editing workflow, and support more use of the /map call. But like many things, we need more people coding, and more community support for those who are already coding. Suggesting that we should deliberately violate the policy, and “break OSM” so that “someone pays attention” would not be the best approach! :-)