OpenStreetMap

Andy Allan's Diary Comments

Diary Comments added by Andy Allan

Post When Comment
OpenStreetMap Website Vulnerability Report

I don’t know Ruby at all so I would be quite useless in contributing to the current codebase.

Even if you don’t want to learn Ruby (and I don’t think you need to know much ruby to contribute, it’s a pretty straightforward codebase without much complex code) then I’m sure you know there’s plenty of other parts of the site like the html, icons, css etc that you could directly contribute to.

I’ve noticed a few UI changes that you’ve tried out on your own codebase, and it would be great to see these kinds of things included in the main project so that our mappers can benefit from them in production. So if you want to make some PRs then I’d be happy to review them. You know what changes you’ve made so you’re in a good position to contribute them to the upstream project!

OpenStreetMap NextGen Development Diary #1

I faced issues with reproducing deployment scenario on my local machine due to outdated documentation (and since I am a Ruby-noob, I couldn’t fix it myself).

I personally put a lot of effort into the DX (Developer Experience) so if you find any outdated documentation, please let us know!

Most of our documentation focusses on setting up developer environments (as opposed to production environments) but I’m always happy to fix any of our documentation if it’s outdated. So please do report your issues at https://github.com/openstreetmap/openstreetmap-website or you can proposed changes there too. Thanks in advance!

OpenStreetMap Website Vulnerability Report

I just want to jump in here and publicly say thank you for these reports, and for giving us time to fix them. Tom and I spent time during the recent Karlsruhe Hack Weekend to go through your reports and make sure that we had addressed them all, and I’m sorry I didn’t get back to you before your disclosure deadline. It looks like you’ve been watching what we were doing anyway since your report is pretty accurate!

One small correction to your report is about “new user tokens system are still using plain text storage” - this isn’t quite accurate. We removed our UserTokens system in https://github.com/openstreetmap/openstreetmap-website/pull/4535 and moved to using a standard Rails token feature: https://edgeapi.rubyonrails.org/classes/ActiveRecord/TokenFor/ClassMethods.html

This feature doesn’t store the tokens anywhere - plaintext or otherwise. They are signed tokens, so the tokens can be sent to users and verified when they are used, without storing anything in the database or elsewhere on the servers.

I’d be happy to see any more bug reports that you come across, and I also invite you to work with us on the main openstreetmap-project directly on these kind of things.

A minute of facts about the duration of changesets

loading the CS on the website or via the API wouldn’t trigger the closing but only trying to upload data into the CS again?

Hmm, not quite.

Remember that all changesets - open or closed - have a closed_at date, it’s just that initially it’s one hour in the future (you can think of it more like “will_be_closed_at”) and often that time has passed already (so more like “was_closed_at”) and the only difference is whether that timestamp is before or after Time.utc.now. There are no updates to the database when a changeset automatically closes, the “will_be_closed_at” timestamp was already saved in the database, either during changeset open or during the last successful update.

The only ways to close a changeset are to a) wait for the closed_at timestamp to pass or b) update the closed_at timestamp to be Time.now.utc by calling the changeset/close API method - which is just an express version of a) for the impatient!

It’s one of these parts of the API where the mental model of a changeset (two states, open vs closed, and various actions like ‘close’ and ‘automatically close’) and the actual code implementation (a predetermined closed_at time, which can be in the future, and can be updated in certain limited circumstances) are quite different. The mental model is useful for mappers and there’s nothing wrong with it, but when you look at the code / database it’s quite different.

A minute of facts about the duration of changesets

Granted, I don’t know where to find the code exactly, but I guess there’s not much “monitoring” involved. You’ll probably see a process that checks every N seconds, whether there are changesets that match Pieter’s description of points 2 and 3 (either 1 hours since the last upload or 24 hours since creation) and then shuts those changesets down.

It’s much simpler than that - there’s no extra monitoring process involved. Whenever something happens to the changeset (e.g. open, diff upload, individual element update, etc), its closed_at attribute is updated.

https://github.com/openstreetmap/openstreetmap-website/blob/e83f0bd13121ab520c68d3a49a3f0f59a1266cd2/app/models/changeset.rb#L186-L198

Then the next time you try to do something (e.g. another diff upload) the code just checks if the changeset closed_at has already passed - if so, the changeset is closed, if not, the closed_at is updated again, etc. The “close changeset” method just checks if the changeset is still open, and if so, sets the closed_at to right now.

https://github.com/openstreetmap/openstreetmap-website/blob/e83f0bd13121ab520c68d3a49a3f0f59a1266cd2/app/models/changeset.rb#L69-L76

So there’s no moving parts within the codebase, no ‘watch’ process and not even an extra update to the db to close each changeset. It’s a clever design (and not something I was involved with!).

I think the more important bits is the side effects on other systems, for example changeset comments, or 3rd-party analysis tools, that might be waiting for a changeset to close before triggering an alert etc. There’s a case to be explored if 24 hours is too high an upper bound for changesets to be kept open (of course, a changeset also needs activity every 60 minutes for every one of those 24 hours, since the changeset closed_at is only extended 60 minutes at a time - so the default is to keep it open for 1 hour (reasonable?) with an upper limit of 24 hours (debatable?)).

OpenStreetMap Service Availability (2023-12-20 - 2024-01-20)

there are additional connectivity checks to my other server in Poland. I exclude any downtime that is also present on that server. :-)

Great!

By the way, do you by chance know anything about the official uptime OSM configuration?

No, sorry I don’t. I’m only involved in the software development, not in the production operations.

OpenStreetMap Service Availability (2023-12-20 - 2024-01-20)

the checks are executed from a single server in the Hetzner datacenter in Germany

Then you are equally likely to be measuring the network availability of Hetzner.

The OSM Iceberg

Fantastic!

State of the Map 2022 "talk to me" (or not) pin-back buttons :)

Great idea! Thanks for creating these Dorothea.

An open letter to OSMF board members concerning problems with OpenStreetMap-Carto

I came across this and feel like I should respond to one aspect of this conversation.

A formal complaint under the openstreetmap-carto CoC has been received by the maintainers, related to some issues discussed here, and we are currently dealing with it. As I’m sure everyone will understand, CoC complaints are dealt with confidentially and not in public, so obviously this isn’t widely known. I normally would not comment publicly on any CoC complaint at all, but I think it’s worthwhile in this situation to say that one is in hand, albeit I’m not going to discuss any further details here.

I encourage anyone who wants to raise a CoC complaint to do so using the formal process that’s documented in the CoC. Unfortunately tagging people in issue comments (like myself, since I don’t follow the issue tracker currently), writing diary entries, or writing letters to other groups of people outside of the project (e.g. the OSMF Board) runs the risk of not being noticed or dealt with. But we do have a clearly and fully documented process for complaints, and I can assure you that any complaints received that way will be dealt with properly.

This letter raises a lot of other important points of discussion, but I thought I should respond to this aspect specifically.

OpenStreetMap Isn't Unicode

U+2B66E can be stored in various ways, depending on the encoding. From the unicodedecode page, the hex representations are:

UTF-8 HEX Value     0xF0AB99AE
UTF-16 HEX Value    0xD86DDE6E

So it seems to me that at some point in your processing chain, something has taken the original UTF-8 from OSM, converted it to UTF-16 in memory, and then something else is reading that same hex value from memory, is unaware of surrogate pairs and is treating “0xD86DDE6E” as the two Unicode characters U+D86D and U+DE6E - which is completely incorrect, since they are both invalid Unicode codepoints. (All codepoints from U+D800 to U+DFFF are defined as completely invalid in any encoding).

But it’s an understandable software error, since for characters that are just one byte in UTF-16, you can take the UTF-16 encoded character and that’s the same value as the corresponding Unicode codepoint e.g. U+9B5A is represented as 0x9B5A in UTF-16.

The error is thrown when the next step tries to write out that sequence of Unicode codepoints in an UTF-8 encoding, since it gets to U+D86D and knows that that is an invalid character and throws the error.

I think, since we’ve again shown that there’s nothing wrong with the UTF-8 stored in the database, and there’s nothing wrong with the UTF-8 in the API / cgimap / diffs / planets, that we’ve gone a long way off topic for this diary entry. Perhaps any further troubleshooting of your toolchain can be carried out on the mailing lists, forum, chat or elsewhere?

OpenStreetMap Isn't Unicode

The key thing in the name that @mmd selected the first character. https://decodeunicode.org/en/u+2B66E . It’s decimal value is too big (177,774) to fit in a 16-bit integer (65,536). It’s rare, but not unheard of, for OSM to use Unicode characters from outside the Basic Multilingual Plane (the first 65,536 characters in Unicode).

In my experience, particularly with Windows, a lot of internal representations of Unicode characters are not stored in memory as UTF-8, but as UTF-16 which consists of either one 16-bit character, or for these rarer characters it’s stored as two 16-bit characters using a technique called surrogate pairs. The wikipedia page on UTF-16 says:

Because the most commonly used characters are all in the BMP, handling of surrogate pairs is often not thoroughly tested. This leads to persistent bugs and potential security holes, even in popular and well-reviewed application software.

I suspect that one of the pieces of your software chain is representing the characters in UTF-16, and not handling surrogate pairs properly, and is throwing an error when given one of these rarer characters.

I hope this helps!

OpenStreetMap Isn't Unicode

Thanks bdon for this blog post. I’d heard before these rumours that OpenStreetMap has Unicode / UTF-8 problems, but I couldn’t find anywhere that gave enough details for me to figure out what was really going on.

Of course, as mmd has explained, there’s not actually any Unicode or UTF-8 problems in our API or data dumps. It’s just that some sequences of valid unicode characters don’t make much sense, and unfortunately, there’s an alternative way of representing text in Burmese that happens to use some of the same code points, so there’s potential for a bit of a mess. But from bryceco’s analysis, it looks like it’s getting much better as time goes on.

I’m happy to see the efforts going into cleaning up the garbled text, and perhaps this is something that could be detected and flagged up by QA tools too?

My would-be answers to the OSMF board survey

Personally I think the survey stinks. The questions are mostly two-topics-in-one and as you describe, the answers will be meaningless since they can be interpreted however the Board prefers. And the phrasing of the questions are so leading (towards supporting decisions already made) that the entire exercise becomes pointless.

The question about large donations to fund things is my prime example. “Do you support (subtly complex topic) in order to do (something that sounds great)”? Of course! We all support something that sounds great! Now the Board points at the survey and say that everyone supports making the OSMF dependent on large donations from corporate interests. Job done.

Asking questions about complex topics (like whether the OSMF should centralise software development funding, or the reliance on large vs individual donors) without providing any context or background is risible.

Sustainable Travel Expenses Resolution – Request for Support

I support putting this to the AGM.

Thoughts on the how and where of the OSMF starting to hand out money in the OSM community

I think it’s worth distinguishing between software that is on the critical path for data (mapper -> editing -> API -> database -> planet and replication), and then everything else. If you remove something on the critical path then OSM stops working. But if you remove something else, such as tileservers or search or routing, then sure mappers will notice but they can still put data in and get data back out again.

That’s how I used to distinguish software and systems when I was on OWG. We had three tiers - the core infrastructure required for the editing API and data distribution (and everything else involved in that, like NFS and DNS and whatnot), then tier two was stuff that would be really impactful for mappers if it wasn’t working (like tileservers and nominatim and the wiki) and then the tier three stuff was things that most mappers wouldn’t even notice if it was missing. Obviously there’s room for debate about this!

OSM the Legal Monster

You make a good point, but then you over-egg the pudding by including unrelated things like the banner policy and the new tile layer policy. I would suggest removing that entire paragraph, your point is stronger without it.

The moderation queue. The first 3000 issues

Thanks for reporting on these statistics! It’s nice to see the feature is proving useful, but of course it would be even nicer if it was unnecessary.

Based on your analysis, what changes to the issues and reports would be the most useful?

OWG Must Be Destroyed

Just to correct a few points - OWG does have policies, including how to join OWG (and the sysadmins group) - see https://operations.osmfoundation.org/policies/ Of course they aren’t perfect or comprehensive but it’s a start. I’m particularly proud of getting agreement for the joining policies because it used to be completely opaque as to what was required.

Also, it’s incorrect to say that nobody has joined OWG since 2011, since Paul Norman joined last year and is active today. I’m not sure if you also meant to exclude Sarah Hoffmann, who was a member of OWG for several years from 2013 to 2017 and is also an active sysadmin today. We also had two probationary members in the last two years, but they didn’t become full voting members.

Sure, I’d rather see 20 other people on the list, and 5 new people every year. But it’s still worth painting an accurate picture.

As for the website and API, developing those aren’t a matter for OWG. Tom does both, but for example I’m no longer on OWG precisely to focus more on the website development. I feel like the development has improved a lot in the last 3 years, but it’s about 5 on a scale of 0 to 100 where 100 is what I’d like to see. And I’ve been on the receiving end of plenty of curt PR comments. It’s not a great experience.

You ask me “Does it really feel okay to you” and of course, no, none of this is good. But I want to work with everyone who is interested in making these situations better, and I want to discourage people from expressing their frustration in a manner that makes things worse.

OWG Must Be Destroyed

I’m not sure that I even want to reply to this, given the (presumably deliberately) outrageous title. Perhaps by responding I’m just encouraging more posts like this in the future? I hope not.

“Destroying” or OWG makes no sense, since it’s there to solve a legitimate purpose. If everyone on it disappeared tomorrow, OSMF would still need those purposes taken care of. The server budgets needs writing. The resource usage (like database disk space) needs forecasting and hardware needs planning. If you really want to “destroy” OWG, then you would need to explain what that stuff is no longer necessary, or suggest which working group should be doing that stuff instead.

You’ve also glossed over whether you want to destroy the sysadmin group, or OWG, or both, but no matter.

So let’s focus on the (slightly) more sensible suggestion, which is to “disband” the group and start again from scratch. Would that really work? Would the incoming people have any idea what needs to be done? Perhaps. Over the last few years we made the hardware site, and wrote a lot of monthly reports, and the chef-repo exists, so perhaps the new people could read all of them and try to get started. But it’s a high-risk strategy to kick everyone off first, instead of adding new people in and keeping those with experience still around to answer questions. I guess the sensible approach wouldn’t generate the dramatic headlines though.

But in any case, diary entries like this can become a damaging self-fulfilling prophecy. What member of the community wants to get involved, if the only public attention you get is posts like this? Why would any sensible person join any working group, if prominent and well-respected former OSMF Board members write posts like this? Try to imagine what being on the receiving end of this would be like. Try to realise what you are doing here, and why there will be even fewer working group members (or even board candidates) in future. You’re creating an environment where even paid contractors won’t want to get involved.

There’s definitely problems in this working group, and they definitely need fixing. But this kind of post does more harm than good.

For anyone who is reading this and wants a more practical set of suggestions from a former OWG member, feel free to read https://gravitystorm.github.io/osmf-infra-plans/ for some ideas, or if you want any other suggestions for improving OWG feel free to ask me any of your questions directly.