OpenStreetMap

The number of the month: 74.9 percent

Posted by drolbr on 20 October 2016 in English (English)

The most important news is that version 0.7.53 is now live on the dev.overpass-api.de and overpass-api.de server. The Rambler server will follow in the next days.

I will give details on this further below. Also below I will explain how an incident with Pokemon Go shaked my mindset about quota policies.

Before this I will present the number of the month: 74.9 percent. From 74.9 percent of the total IPv4 /24 subnets of the entire world I have observed requests in the logfiles. Amongst some random IP block owners from the logfile: General Motors, Ford, Toyota, Daimler, BMW, Volkswagen. Media Outlets: New York Times, Guardian, Der Spiegel. Further starring: SNCF, Deutsche Bahn, SBB, numerous universities and even ESRI. Of course also long lists of telephone carriers. Remember that Overpass API is a quite deeply buried and highly technical service. It is almost sure that the popularity for the combined tile servers is even higher. This matches very good with the observation that half of the public administration in Germany is figuring out how to get OpenStreetMap in their workflow.

Punchline: OpenStreetMap is by no means small. It is the de-facto standard for general purpose geodata. And if there are limits to the growth of OpenStreetMap in sight, then these are most likley the size of mankind.

This does not mean that the majority of mankind is using OpenStreetMap. Please do not forget that you cannot eat geodata. Or substitute drugs. Geodata is in important field, but not the core of the world or the internet. The statistics from above say that whoever is involved in geodata is almost surely aware of OpenStreetMap.

Hence "irrelevant" is an adjective quite entirely unrelated to OpenStreetMap. Blog posts stating otherwise are simply wrong. This need not be a willful misinformation, but it may be an observation from a very unusal environment somewhere in the last 25.1 percent of the IPv4 space. Hence there is no need to have some obscure and complicated extra documents called "CoC" or so just to please unknown people that might not exist at all. Let aside that there is few to no precedence that extra bureaucracy pleases people.

But back to Overpass API. During the weekend around October 1st I have seen a spike in load. Such a spike is most of the time some developer trying to offload undue amounts of requests on the Overpass API. In this case it turns out that the only client responsible for a load spike is (overpass-turbo.eu)[http://overpass-turbo.eu] - due to the highly technical nature of the tool this is more than unlikely.

It has taken some days until the search engines have delivered the evidence what was going on: overpass-turbo.eu has got credits from the remaining Pokemon Go community. Testing against that hypothesis, I found that people have accessed from 30'000 different IP adresses per each day of the weekend on Overpass API - roughly twice the normal. However, there is plenty of credit for OpenStreetMap. From the logs I can observe that a lot of users come from developing countries. And from my personal environment I know that more than half of Pokemon-Go-players are female. Actually I would call this as exactly what we like to achieve with outreach. Hence, neither blocking these users constituting the spike nor blocking overpass-turbo.eu altogether would be an option that makes sense.

So a second result of this weekend is that I should ask rather soon for more server capacity. The improvements by Mmd and me may help. But a second server at some point in 2017 would probably help to attract people. Maybe even people that do not yet have a relation to geodata.

Version 0.7.53 excels rather at having fewer bugs than at having new features. There are nonetheless some improvements: * [!key] can be used as shortcut for [key!~"."] * the user statement accepts multiple users as a comma separated list

Version 0.7.54 is hopefully ready at the end of the year. I have already started to develop some features. Others have been sketched in my SotM talk (video tba). So please stay tuned. And in the meantime, please spread the message that OpenStreetMap is already the standard choice for general purpose geodata.

Comment from Peda on 20 October 2016 at 06:50

Roland, I accidentally discovered that the SotM videos had been cut and uploaded about 2 weeks ago. So your video is online, too.

Hide this comment

Comment from imagico on 20 October 2016 at 08:25

One thing that might make sense to consider in the long term how it can be better communicated that intensive organized users of the Overpass API need to set up their own instance or buy commercially run services for this - in a similar way as with the tile servers. You indicate that a large part of the load comes from small distributed users but likely lightening the load from volume users would probably improve the overall situations none the less.

One thing i noticed recently that a common use task i have for overpass (usually via overpass turbo) is querying occurrence of fairly rare tag combinations of tags with individually widespread use. This seems to be a fairly hard thing for overpass to do - i frequently get errors (something like out of memory IIRC).

And if there are limits to the growth of OpenStreetMap in sight, then these are most likley the size of mankind.

This is probably right as far as growth is concerned but i think maintaining the data and keeping it up-to-date is a different story because it does not necessarily scale with the number of people involved but more with the intensity of their involvement. See also the musings of Alan McConchie on the matter.

Hide this comment

Comment from tyr_asd on 21 October 2016 at 18:57

30'000 different IP adresses per each day of the weekend [accessed] Overpass API

yep, that's in line with what I'm seeing in overpass turbo's recent access logs (see the chart below; the pre-event base level is around ~500 visits per day ^_^). The bulk of new users seems to come from the US, though.

Hide this comment

Comment from drolbr on 23 October 2016 at 18:06

@tyr_asd: It is a question of relating numbers. There are peaks in the usage e.g. from Philipines and from Malaysia, each around 400 different users. These are ten times more users than usual from these countries. That there is also a large number of users from the US is doesn't change that enormous increase in other countries with a small user base so far.

@imagico: Of course we would like commercial users to get their own instance and pay for it. But that is not the point here.

To both: there have been 400 users from each Malaysia and Philipines (and a lot of other countries). Should we have the capacities to offer to them good service at first try? Should we give a first impression as a vibrant community with useful data and tools? Or should we accept that the service (and maybe OSM as a whole) appears sluggish to these users at the first contact, because it is most likely that first contact happens during a load peak?

In other words, there are two questions here:

  • Do we consider these peak users as people we would like to attract? I would say that people who had a benefit from obtaining geodata from OSM are likely to get back at a later point in time to contribute. Not all, but if 10 percent come back with an intrinsic motivation then we have enlarged the community. It is, by the way, how I myself have come to OSM. The data was already useful, but there have been things to add or to correct.

  • Do we want to earmark resources to leave a better first impression to these users, or are resources elsewhere more efficient? Given this event I would say that a second server may make sense.

Hide this comment

Comment from mmd on 8 October 2017 at 15:25

Talking about efficiency. Recent measurements on a performance optimized branch indicate that a full day worth of queries can in fact be processed in a 25h timeframe on just ~ 2 CPU cores. This includes minutely updates, hourly area updates and gzip compression on the webserver.

Yet, the production instance at overpass-api.de frequently uses 6 out of 8 CPU cores, sometimes even 7 out of 8 cores according to munin. So, 4-5 cores are busy doing some work, which adds no real value. Even worse, users are getting "Too many queries" or "Timeout" error messages, although more resources could be made available for more productive purposes. It's really only a matter of addressing the respective Github issues.

Hide this comment

Leave a comment

Parsed with Markdown

  • Headings

    # Heading
    ## Subheading

  • Unordered list

    * First item
    * Second item

  • Ordered list

    1. First item
    2. Second item

  • Link

    [Text](URL)
  • Image

    ![Alt text](URL)

Login to leave a comment