OpenStreetMap

b-jazz's Diary Comments

Diary Comments added by b-jazz

Post When Comment
Adding Microsoft Building Footprints To OSM With MapRoulette: Why And How

I see a lot of building errors when going through and cleaning up things that OSM Inspector points out. I’m 95% positive these are from the Microsoft data, but I haven’t been able to conclusively prove it. I also haven’t been able to track anyone down that might be able to correct the area and update the shapefile dump so these errors don’t continue to show up.

OpenStreetMap is currently free from duplicate nodes

I do these when I find them as well. I also clean up where two different node IDs share the same exact lat/long pair. I have some scripts set up that will download the data from OSM Inspector and then parses it up into small areas so that I can work on several duplicates in a single changeset.

Thanks for helping with the cause.

Analysis of Bounding Box Sizes Over the Last Eight Years

@Jennings: oh man, I wish I knew about that Amazon/Athena store before I started this. That might have been a big help. I’ll have to store that one away for any future plans. Thanks!

I was thinking of doing the area / num_changes analysis next, and I’m pondering if that is much different than num_changes / area.

What I’d really like to do is figure out the “empty space” of the bounding box if you consider the bounding boxes of the individual changes of the set. But I’d need to dig further and find the objects/bbox of the individual objects, which isn’t available in my data. If I were to have my ideal feature in an editor that would warn me that my bounding box is too large, I’d really want it to warn me when the empty space is too large. If I edit a single way that is massive (the Bermuda Triangle for example), the overlapping bbox of the three ways would pretty much fill the bbox of the changeset. At least that should be one factor to consider when deciding whether or not the “hassle” the user with a warning.

As for bots, do you have a dataset of bot usernames/userids?

Analysis of Bounding Box Sizes Over the Last Eight Years

@PierZen: thanks for the comments and your tweet with another view of changeset analysis. Neat stuff.

Analysis of Bounding Box Sizes Over the Last Eight Years

@imagico: I added a third heatmap with the perimeter length of the bounding box. There are fewer buckets (keeping with earlier use of doubling the bucket size on ever iteration) so the “heat” looks a little more condensed, but turns up some slight variations in the pattern. It’s an interesting take. Thanks for the suggestion.

Analysis of Bounding Box Sizes Over the Last Eight Years

@imagico. Yes, yes. I see what you’re saying. Thanks for the comments! I’ll look into doing another heatmap with this idea.

Analysis of Bounding Box Sizes Over the Last Eight Years

@tyr_asd I think I did start with that data but learned that it either didn’t include the bounding box or organize the changes into changesets. There was some reason I didn’t use it, but it’s not worth another 3.5GB download to figure out why. :)

Analysis of Bounding Box Sizes Over the Last Eight Years

@imagico To calculate area, I use PostGIS’s ST_AREA() function. Over the past year, my sampling (roughly 1/13th of all changesets) has 829 records that are over 2^40th square meters. I’m not a statistician and can’t speak to how representative sampling is when it comes to rare events (0.06% in this case), but I can see it being inaccurate in either direction.

Analysis of Bounding Box Sizes Over the Last Eight Years

Looking at my (sampled) data, there were 760 changesets with only two objects modified that ended up making for a changeset bounding box of greater than 1,000 square kilometers. The sampling factor is roughly 1:13, so that extrapolates to 10,000 very large changesets in a year from just two changes (likely two nodes from a few that I looked at by hand).

Improving the Behavior of Search Engine Optimizer (SEO) Companies

Thanks @aharvey. For others, the video can be found at https://www.youtube.com/watch?v=BovbAIIJ6L8

For us, the hardest part was trying to block a moving target. Since a new account was added for every single POI they created.

It was a lot of effort to chase down the source of the edits, but I think it was worth it in the end. I’d suggest starting there. And if/when you do find the people responsible, treat them with respect and understand where they are coming from and try to sell them on a win/win solution.

HTTPS All The Things (https_all_the_things)

I’m refactoring the code as we speak to apply to a broader set of tags. I expect I’ll start a run of those in the next week or two. And yes, you’re right that the whole planet has been completely looking for the website key. I’m just rolling across the entire planet about once a week looking for new additions.

HTTPS All The Things (https_all_the_things)

Thanks @escada. I only thought about “website”, “:website”, “url”, and “:url”. I wasn’t aware of “image”. Looks like there are over 100,000 image tags. I’ll look into it and see if they are predominantly URLs.

HTTPS All The Things (https_all_the_things)

I’ve found about 3000 instances of http://www.example.com redirecting to https://example.com in the lower 48. This makes me happy (because I abhor ‘www’). I’ll put a fix and run batches again as soon as I implement www.example.com to http://www.example.com as well. Great find @rorym.

HTTPS All The Things (https_all_the_things)

Three excellent questions/suggestions. Thanks!

  1. You’re quite right. I’ll add that right away.
  2. That’s correct, it doesn’t handle it, and it should. (I’m a big advocate of ridding the world of the scurge of having to say “double you double you double you”.) Now I’m curious and I’ll dig through the logs and see if there were any cases of that occurring.
  3. I’m planning on adding that today, though I’m not going to make an assumption about favoring to https, thinking that maybe some crazy/lazy website owners don’t have their https matching their http site. I’ll hit up the http version, and if it redirects, then I’ll update the value.
HTTPS All The Things (https_all_the_things)

As clear as mud. ;-)

if your edit affects only one country or territory then the national-language mailing lists, forums, or other standard communication methods for the territory affected by the change

My argument is that osmus.slack.com is a national-language forum for the U.S. with excellent representation. If that isn’t good enough for one reason or another, the wiki should call that out.

HTTPS All The Things (https_all_the_things)

Thanks for the feedback @Nakaner. I’ll make sure I mention it in both the talk-us mailing list and the Slack channel in the future. Do you want to edit the AECoC page to point out that discussions shouldn’t take place solely on “proprietary communication channels”? Maybe we can prevent someone else from interpreting the page as I did in the future.

HTTPS All The Things (https_all_the_things)

I agree that it would be pretty clear at that point that you can use HTTPS, but I think a simple HTTPS redirect is pretty convincing. Especially in this day and age when more and more websites are getting clued in about the importance of secure transmissions.

HTTPS All The Things (https_all_the_things)

@Wynndale: Thanks for pointing that out. I’ll redact the names of the slack thread and post the rest of the content in the wiki so that people not on the US Slack server can see comments. I am currently only rewriting 301 (Moved Permanently) and 302 (Found). As you probably know, 302 has been known at times as Moved Temporarily. So it’s arguable that I shouldn’t be rewriting any of the 302 redirects, but IMO most website operators are using 302 when they really should be doing 301. It is the reason though that I’m avoiding touching anything that is much different from the original url. I’ve seen a bunch of domains redirecting to a facebook page or a google site temporarily. Those remain untouched. As for HSTS, I wasn’t familiar with that, but did a little reading. I’m not sure how you think that could be incorporated into what I’m doing. Can you explain?

HTTPS All The Things (https_all_the_things)

@rorym: You can find the python code at https://gitlab.com/b-jazz/https_all_the_things/. It’s not meant for others to run just yet, but is there for review and comments. I’m currently just touching the “website” tag, but will likely add “url” and “contact:website” for the next go-around. I’m not sure I’ll do more than those as they make up the vast majority of http urls that are tagged. I’m happy to hear arguments on others that should really be included. When comparing the urls: I’m currently doing four checks. For http://example.com, I’m looking for https://example.com, https://example.com/, https://www.example.com, and https://www.example.com/. Those are the most common variations when specifying redirect urls. At this point, I’m not tackling protocol-less urls, but I certainly could. I should do some research and find out how common it is to leave off the http://. As for the U.S. vs. the entire planet, I’m open to running it on the rest of the world, but I just started with the U.S. as I know that community better than the rest of the world and only posted there looking for feedback. I could built up the script a little more and document how to run it and let others do their own countries. What I worry about most is getting buy-off from the larger community across the globe. If someone gives me the go ahead, I’ll happily run it world wide.

The most surreal and memorable OSMF board meeting yet

Thanks Richard. I took it to mean, “you’re acting like a child complaining about this”. I appreciate the clarification.