OpenStreetMap

pnorman's Diary

Recent diary entries

Dear all,

Today, v5.7.0 of the OpenStreetMap Carto stylesheet (the default stylesheet on the OSM website) has been released. Once changes are deployed on openstreetmap.org it will take couple of days before all tiles show the new rendering.

Changes include - Unpaved roads are now indicated on the map (#3399)

  • Country label placement improved, particularly for countries in the north (#4616)

  • Added elevation to wilderness huts (#4648)

  • New index for low-zoom performance (#4617)

  • Added a script to switch between script variations for CJK languages (#4707)

  • Ordering fixes for piers (#4703)

  • Numerous CI improvements

Thanks to all the contributors for this release, including wyskoj, tjur0, depth221, SlowMo24, altilunium, and cklein05, all new contributors.

For a full list of commits, see https://github.com/gravitystorm/openstreetmap-carto/compare/v5.6.2…v5.7.0

As always, we welcome any bug reports at https://github.com/gravitystorm/openstreetmap-carto/issues

OSM usage by country

Posted by pnorman on 22 November 2022 in English (English).

I gathered some statistics about usage of the website and tiles in 2022Q3.

I looked at total tile.osm.org usage, tile.osm.org usage from osm.org itself, osm.org visits, and osm.org unique visitors.

Here’s the data for the top 20 countries.

country osm.org tile requests total tile requests Website visits Website unique Visitors
DE 17.79% 7.78% 8.27% 7.98%
RU 12.23% 8.49% 2.47% 2.43%
US 8.72% 9.22% 13.11% 13.56%
PL 7.69% 4.99% 3.09% 2.80%
GB 4.85% 3.68% 4.42% 4.42%
FR 4.79% 7.00% 3.91% 3.94%
NL 3.62% 3.31% 2.17% 2.09%
IT 3.49% 3.46% 4.74% 4.86%
IN 2.64% 2.66% 3.67% 3.16%
CN 2.62% 0.79% 2.65% 2.72%
AT 2.03% 0.89% 0.98% 0.91%
UA 1.78% 1.98% 1.20% 1.21%
CH 1.41% 0.71% 0.83% 0.82%
CA 1.29% 1.59% 1.36% 1.39%
BE 1.29% 1.06% 1.10% 1.03%
ES 1.27% 2.41% 2.32% 2.39%
JP 1.10% 1.54% 1.74% 1.71%
AU 1.09% 0.92% 0.88% 0.82%
SE 0.91% 0.95% 0.87% 0.88%
FI 0.89% 0.74% 0.74% 0.71%

I’ve put the full data into a gist on github

Dear all,

Today, v5.6.1 of the OpenStreetMap Carto stylesheet (the default stylesheet on the OSM website) has been released. Once changes are deployed on the openstreetmap.org it will take couple of days before all tiles show the new rendering.

Changes include

  • Fixing rendering of water bodies on zooms 0 to 4

Thanks to all the contributors for this release.

For a full list of commits, see https://github.com/gravitystorm/openstreetmap-carto/compare/v5.6.0…v5.6.1

As always, we welcome any bug reports at https://github.com/gravitystorm/openstreetmap-carto/issues

Dear all,

Today, v5.6.0 of the OpenStreetMap Carto stylesheet (the default stylesheet on the OSM website) has been released. Once changes are deployed on the openstreetmap.org it will take couple of days before all tiles show the new rendering.

Changes include

  • using locally installed fonts instead of system fonts, for more up to date fonts;
  • changing tree and tree row colours to the same colour as areas with trees;
  • rendering parcel lockers; and
  • rendering name labels of bays and straights from z14 only, and lakes from z5

Thanks to all the contributors for this release including GoutamVerma, yvecai, ttomasz, and Indieberrie, new contributors.

For a full list of commits, see https://github.com/gravitystorm/openstreetmap-carto/compare/v5.5.1…v5.6.0

As always, we welcome any bug reports at https://github.com/gravitystorm/openstreetmap-carto/issues

OpenStreetMap Carto could use more help reviewing pull requests, so if you’re able to, please head over to Github and review some of the open PRs.

This is a bit less OpenStreetMap related then normal, but has to do with the Standard Tile Layer and an outage we had this month.

On July 18th, the Standard Tile Layer experienced degraded service, with 4% of traffic resulting in errors for 2.5 hours. A significant factor in the time to resolve the incident was a lack of visibility of the health status of the rendering servers. The architecture consists of a content delivery network (CDN) hosted by Fastly, backed by 7 rendering servers. Fastly, like most CDNs, offers automatic failover of backends by fetching a URL on the backend server and checking its response. If the response fails, it will shift traffic to a different backend.

A bug in Apache resulted in the servers being able to handle only a reduced number of connections, causing a server to fail the health check, diverting all load to another server. This repeated with multiple servers, sending the load between them until the first server responded to the health check again because it had zero load. Because the servers were responding to most of the manually issued health checks and we had no visibility into how each Fastly node was directing its traffic, it took longer to find the cause than it should have.

Our normal monitoring is provided by Statuscake, but this wasn’t enough here. Instead of increasing the monitoring, we wanted to make use of the existing Fastly healthchecks, which probe the servers from 90 different CDN points. Besides being a vastly higher volume of checks, this more directly monitors the health checks that matter for the service

During the incident, Fastly support provided some details on how to monitor health check status. Based on this guide, the OWG has set up an API on the tile CDN to indicate backend health, and monitoring to track this across all POPs.

Fastly uses a modified version of Varnish, which supports VCL for configuration. This is a powerful language, which lets us do sophisticated load-balancing, and in this case, even create an API directly on the CDN.

We start with a custom VCL snippet within the recv subroutine that directs requests to the API endpoint to a custom error

if (req.url.path ~"^/fastly/api/hc-status") {
  error 660;
}

Next, we make another VCL snippet within the error subroutine that manually assembles a JSON response indicating the servers’ statuses, as well as headers with the same information

 if (obj.status == 660) {
  # 0 = unhealthy, 1 = healthy
  synthetic "{" LF
      {"  "timestamp": ""} now {"","} LF
      {"  "pop": ""} server.datacenter {"","} LF
      {"  "healthy" : {"} LF
      {"    "ysera": "} backend.F_ysera.healthy {","} LF
      {"    "odin": "} backend.F_odin.healthy {","} LF
      {"    "culebre": "} backend.F_culebre.healthy {","} LF
      {"    "nidhogg": "} backend.F_nidhogg.healthy {","} LF
      {"    "pyrene": "} backend.F_pyrene.healthy {","} LF
      {"    "bowser": "} backend.F_bowser.healthy {","} LF
      {"    "baleron": "} backend.F_balerion.healthy LF
      {"  }"} LF
      {"}"};
  set obj.status = 200;
  set obj.response = "OK";
  set obj.http.content-type = "application/json";
  set obj.http.x-hcstatus-ysera = backend.F_ysera.healthy;
  set obj.http.x-hcstatus-odin = backend.F_odin.healthy;
  set obj.http.x-hcstatus-culebre = backend.F_culebre.healthy;
  set obj.http.x-hcstatus-nidhogg = backend.F_nidhogg.healthy;
  set obj.http.x-hcstatus-pyrene = backend.F_pyrene.healthy;
  set obj.http.x-hcstatus-bowser = backend.F_bowser.healthy;
  set obj.http.x-hcstatus-balerion = backend.F_balerion.healthy;
  return (deliver);
}

This API can be manually viewed to show the status, but it only works from the CDN node you’re connecting through. To monitor all of the nodes at once, we use the Fastly edge_check endpoint. When called with an authorized token, the response looks something like

[
  {
    "pop": "frankfurt-de",
    "server": "cache-fra19139"
    },
    "response": {
      "headers": {
        "x-hcstatus-ysera": "1",
        "x-hcstatus-odin": "1",
        "x-hcstatus-culebre": "1",
        "x-hcstatus-nidhogg": "1",
        "x-hcstatus-pyrene": "1",
        "x-hcstatus-bowser": "1",
        "x-hcstatus-balerion": "1"
      },
      "status": 200
    }
  },

  {
    "pop": "yvr-vancouver-ca",
    "server": "cache-yvr1528"
    },
    "response": {
      "headers": {
        "x-hcstatus-ysera": "1",
        "x-hcstatus-odin": "1",
        "x-hcstatus-culebre": "1",
        "x-hcstatus-nidhogg": "1",
        "x-hcstatus-pyrene": "1",
        "x-hcstatus-bowser": "1",
        "x-hcstatus-balerion": "1"
      },
      "status": 200
    }
  }
]

The real response has a lot more headers and other information in it, as well as another 90 POPs, but what I’ve shown is the important information. This is all the information required, but it’s not in a very useful form. To make it useful, we need to gather the data with our monitoring tool Prometheus. This is done with a simple prometheus exporter that queries the URL, parses the response, and writes out metrics. Once the metrics are in Prometheus, we can do alerting on them and graph them.

Because the metrics are 1 or 0, taking the average with avg(fastly_healthcheck_status{host="tile.openstreetmap.org"}) by (backend) gives a graph indicating the backend status, as measured by Fastly POP healthchecks. This graph is now on the Tile Rendering Dashboard.

Dear all,

Today, v5.5.1 of the OpenStreetMap Carto stylesheet (the default stylesheet on the OSM website) has been released. Once changes are deployed on the openstreetmap.org it will take couple of days before all tiles show the new rendering.

The one change is a bugfix to the colour of gates (#4600)

For a full list of commits, see https://github.com/gravitystorm/openstreetmap-carto/compare/v5.5.0…v5.5.1

As always, we welcome any bug reports at https://github.com/gravitystorm/openstreetmap-carto/issues

Dear all,

Today, v5.5.0 of the OpenStreetMap Carto stylesheet (the default stylesheet on the OSM website) has been released. Once changes are deployed on the openstreetmap.org it will take couple of days before all tiles show the new rendering.

Changes include

  • Fixed colour mismatch of car repair shop icon and text (#4535)

  • Cleaned up SVG files to better align with Mapnik requirements (#4457)

  • Allow Docker builds on ARM machines (e.g. new Apple laptops) (#4539)

  • Allow file:// URLs in external data config and caching of downloaded files (#4468, #4153, #4584)

  • Render mountain passes (#4121)

  • Don’t use a cross symbol for more Christian denominations that don’t use a cross (#4587)

Thanks to all the contributors for this release, including stephan2012, endim8, danieldegroot2, and jacekkow, new contributors.

For a full list of commits, see https://github.com/gravitystorm/openstreetmap-carto/compare/v5.4.0…v5.5.0

As always, we welcome any bug reports at https://github.com/gravitystorm/openstreetmap-carto/issues

I’m working on publishing a summary of sites using tile.osm.org and want to know what format would be most useful for people.

The information I’ll be publishing is requests/second, requests/second that were cache misses, and domain. The first two are guaranteed to be numbers, while the last one is a string that will typically be a domain name like www.openstreetmap.org, but could theoretically contain a poisoned value like a space.

The existing logs which have tiles and number of requests are formatted as z/x/y N where z/x/y are tile coordinates and N is the number of accesses.

My first thought was TPS TPS_MISS DOMAIN, space-separated like the existing logs. This would work, with the downside that it’s not very future proof. Because the domain can theoretically have a space, it has to be last. This means that any future additions will require re-ordering the columns, breaking existing usage. Additionally, I’d really prefer to have the domain at the start of the line.

A couple of options are - CSV, with escaping - tab-delimited

Potential users, what would work well with the languages and libraries you prefer?

An example of the output right now is

1453.99 464.1 www.openstreetmap.org  
310.3 26.29 localhost
136.46 39.68 dro.routesmart.com
123.65 18.54 www.openrailwaymap.org
107.98 0.05 www.ad-production-stage.com
96.64 1.78 r.onliner.by
91.42 0.16 solagro.org
87.83 1.53 tvil.ru
84.88 12.98 eae.opekepe.gov.gr
74.0 2.32 www.mondialrelay.fr
63.44 1.93 www.lightningmaps.org
63.22 14.01 nakarte.me
55.1 0.74 qualp.com.br
52.77 11.25 apps.sentinel-hub.com
46.68 4.07 127.0.0.1
46.3 1.96 www.gites-de-france.com
43.47 1.15 www.anwb.nl
42.46 10.52 dacota.lyft.net
41.13 6.63 www.esri.com
40.84 0.69 busti.me

The OpenStreetMap Foundation runs several services subject to usage policies.

If you violate the policies, you might be automatically or manually blocked, so I decided to write a post to help community members answering questions from people who got blocked. If you’re a blocked user, the best place to ask is in the IRC channel #osm-dev on irc.oftc.net. Stick around awhile to get an answer.

The most important question is which API is being used. For this, look at the URL you’re calling.

If the URL contains nominatim.openstreetmap.org, review the usage policy. The most common cause of being blocked is bulk geocoding exceeding 1 request per second. Going over this will trigger automatic IP blocks. These are automatically lifted after several hours, so stop your process, fix it, wait, and then you won’t be blocked.

If you’re using nominatim but not exceeding 1 request per second, to get help you should provide the URL you’re calling, the HTTP User-Agent or Referer you’re sending, the IP you’re requesting from, and the HTTP response code.

If you’re calling tile.openstreetmap.org or displaying a map, review the tile usage policy. The most common causes of being blocked is tile scraping or apps that don’t follow the usage policy.

To get help you should provide where the map is being viewed (e.g. an app, website, or something else), the HTTP User-Agent or Referer you’re sending, the IP you’re requesting from, and the HTTP response code. For a website, you can generally get this information through the browser’s developer tools. The tile.openstreetmap.org debug page will also show you you this information.

If you’re having problems with an app that you’re not the developer of, you’ll often need to contact them, as they are responsible for correctly calling the services.

OpenStreetMap Carto release v5.4.0

Posted by pnorman on 23 September 2021 in English (English).

Dear all,

Today, v5.4.0 of the OpenStreetMap Carto stylesheet (the default stylesheet on the OSM website) has been released. Once changes are deployed on the openstreetmap.org it will take couple of days before all tiles show the new rendering.

Changes include

  • Added a new planet_osm_line_label index (#4381)
  • Updated Docker development setup to use offical PostGIS images (#4294)
  • Fixed endline conversion issues with python setup scripts on Windows (#4330)
  • Added detailed rendering of golf courses (#4381, #4467)
  • De-emphasized street-side parking (#4301)
  • Changed subway stations to start text rendering at z15 (#4392)
  • Updated road shield generation scripts to Python 3 (#4453)
  • Updated external data loading script to support pyscopg2 2.9.1 (#4451)
  • Stopped displaying tourism=information with unknown information values
  • Switched the Natural Earth URL to point at its new location (#4466)
  • Added more logging to the external data loading script (#4472)

Thanks to all the contributors for this release including ZeLonewolf, kolgza, and map-per, new contributors

For a full list of commits, see https://github.com/gravitystorm/openstreetmap-carto/compare/v5.3.1…v5.4.0

As always, we welcome any bug reports at https://github.com/gravitystorm/openstreetmap-carto/issues

OpenStreetMap Standard Layer: Requests

Posted by pnorman on 29 July 2021 in English (English). Last updated on 30 July 2021.

This blog post is a version of my recent SOTM 2021 presentation on the OpenStreetMap Standard Layer and who’s using it.

With the switch to a commercial CDN, we’ve improved our logging significantly and now have the tools to log and analyze logs. We log information on both the incoming request and our response to it.

We log

  • user-agent, the program requesting the map tile;
  • referrer, the website containing a map;
  • some additional headers;
  • country and region;
  • network information;
  • HTTP protocol and TLS version;
  • response type;
  • duration;
  • size;
  • cache hit status;
  • datacenter;
  • and backend rendering server

We log enough information to see what sites and programs are using the map, and additional debugging information. Our logs can easily be analyzed with a hosted Presto system, which allows querying large amounts of data in logfiles.

I couldn’t do this talk without the ability to easily query this data and dive into the logs. So, let’s take a look at what the logs tell us for two weeks in May.

Usage of standard layer in May

Although the standard layer is used around the world, most of the usage correlates to when people are awake in the US and Europe. It’s tricky to break this down in more detail because we don’t currently log timezones. We’ve added logging information which might make this easier in the future.

Based off of UTC time, which is close to European standard time, weekdays average 30 000 requests per second incoming while weekends average 21 000. The peaks, visible on the graph, show a greater difference. This is because the load on weekends is spread out over more of the day.

On average over the month we serve 27 000 requests per second, and of these, about 7 000 are blocked.

Blocked Requests

Seven thousand requests per second is a lot of blocked requests. We block programs that give bad requests or don’t follow the tile usage policy, mainly

  • those which lie about what they are,
  • invalid requests,
  • misconfigured programs, or
  • scrapers trying to download everything

They get served

  • HTTP 400 Bad Request if invalid,
  • HTTP 403 Forbidden if misconfigured,
  • HTTP 418 I'm a teapot if pretending to be a different client, or
  • HTTP 429 Too Many Requests if they are automatically blocked for making excessive requests by scraping.

Before blocking we attempt to contact them, but this doesn’t always work if they’re hiding who they are, or they frequently don’t respond.

HTTP 400 responses are for tiles that don’t exist and will never exist. A quarter of these are for zoom 20, which we’ve never served.

For the HTTP 403 blocked requests, most are not sending a user-agent, a required piece of information. The others are a mix of blocked apps and generic user-agents which don’t allow us to identify the app.

Fake requests get a HTTP 418 response, and they’re nearly all scrapers pretending to be browsers.

May blocked chart

In July we added automatic blocking of IPs that were scraping the standard layer, responding with HTTP 429 IPs that are requesting way too many tiles from the backend. This only catches scrapers, but a tiny 0.001% of users were causing 13% of the load, and 0.1% of QGIS users causing 38% of QGIS load.

July blocked chart

This blog post is a version of my recent SOTM 2021 presentation on the OpenStreetMap Standard Layer and who’s using it.

The OpenStreetMap Standard Layer is the default layer on openstreetmap.org, using most of the front page. It’s run by the OpenStreetMap Foundation, and the Operations Working Group is responsible for the planning, organisation and budgeting of OSMF-run services like this one and servers running it. There are other map layers on the front page like Cycle Map and Transport Map, and I encourage you to try them, but they’re not hosted or planned by us.

Technology

At the high level, this is the overview of the technology the OWG is responsible for. The standard layer is divided into million of parts, each of which is called a tile, and we serve tiles.

Flowchart of rendering

OSM updates flow into a tile server, where they go into a database. When a tile is needed, a program called renderd makes and store the tile, and something called mod_tile serves it over the web. We have multiple render servers for redundancy and capacity. We’re completely responsible for these, although some of them run on donated hardware.

In front of the tile server we have a content delivery network. This is a commercial service that caches files closer to the users, serving 90% of user requests. It is much faster and closer to the users, but knows nothing about maps. We’re only responsible for the configuration.

The difference between the tile store and tile cache is how they operate, and size. The tile store is much larger and stores more tiles.

Only the cache misses from the CDN impose a load on our servers. When looking at improving performance of the standard layer, I tend to look at cache misses and how to reduce them.

Policy

The OWG has a tile usage policy that sets out what you can and cannot do with our tile layer. We are in principle happy for our map tiles to be used by external users for creative and unexpected uses, but our priority is providing a quickly updating map to improve the editing cycle. This is a big difference between the standard layer and most other commercially available map layers, which might update weekly or monthly.

We prohibit some acitivities like bulk-downloading tiles for a large area (“scraping”) because it puts an excessive load on our servers. This is because we render tiles on-demand and someone scraping all the tiles in an area is downloading tiles they will never view.

As part of figuring out how to best process standard tile layer logs I had a chance to generate some charts for usage of the OpenStreetMap Standard tile layer on the day of 2021-03-14, UTC time. This was over a weekend, so there are probably differences on a weekday. I’m also only looking at tiles delivered and not including blocked tiles from scrapers and similar usage. All traffic is in tiles per second, averaged over the day.

Countries

I first looked at usage of the layer from users on openstreetmap.org and all users, by country.

Country code osm.org-based traffic total traffic
DE 237.7 1299.54
PL 89.07 674.67
RU 69.97 949.04
US 67.64 1474.47
FR 61.75 1234.47
GB 55.75 628.81
IT 41.32 432.84
NL 40.78 428.73
AT 27.14 115.6
CH 21.84 116.57
UA 19.38 303.38
CN 17.93 330.04
BE 16.95 189.03
CA 15.97 269.16
ES 13.56 353.89
AU 11.26 145.75
JP 11.25 256.9
IN 11.04 223.02
SE 10.42 154.83
FI 10.24 118.19
KZ 9.75 55.72
AR 9.57 263.79
TR 9.46 132.14
HU 9.39 169.86
HK 9.31 130.87
CZ 8.53 158.03
BR 8.19 472.51
ID 7.93 182.18
PH 7.46 53.86
SK 6.89 63
DK 6.73 116.17
RO 5.66 312.97
IR 5.62 300.05
TW 5.37 102.62
KR 5.3 35.72
BY 5.25 68.57
IL 4.89 53.97
HR 4.82 43.07
IQ 4.76 16.92
NO 4.4 59.52
RS 4.33 42.49
NZ 4.15 38.56
CO 4.12 203.94
MX 3.6 190.62
GR 3.28 45.04
PT 3.26 56.45
IE 2.88 64.29
LT 2.81 63.05
TH 2.62 75.52
CL 2.61 55.24
MY 2.54 32.12
VN 2.51 85.74
SI 2.33 19.29
SG 2.32 33.75
EE 2.31 21.87
LU 2.29 9.61
BG 2.12 40.86
LV 2.12 44.77
EG 1.9 29.49
BA 1.7 13.31
BD 1.59 67.94
ZA 1.45 25.32
AE 1.42 19.94
DZ 1.32 18.24
PK 1.26 31.58
PE 1.26 68.36
SA 1.24 40.63
YE 1.14 1.8
MA 1.11 18.1
MD 1.02 12.5

Traffic is very much as I expected, with OSM.org usage generally correlated with users.

Hosts

There’s a few ways to reach the standard tile layer. The recommended one is tile.openstreetmap.org, but there’s the legacy a.tile.openstreetmap.org, b.tile.openstreetmap.org, and c.tile.openstreetmap.org domains, and other domains that alias to the same service. If you’re setting up something new, use only tile.openstreetmap.org and HTTP/2 will handle multiple tile fetches in parallel.

host TPS
a.tile.openstreetmap.org 4251.35
b.tile.openstreetmap.org 3668.94
c.tile.openstreetmap.org 3595.94
tile.openstreetmap.org 2282.77
b.tile.osm.org 225.13
a.tile.osm.org 207.61
c.tile.osm.org 200.73
tile.osm.org 2.25
b.Tile.openstreetmap.org 0
c.Tile.openstreetmap.org 0
a.Tile.openstreetmap.org 0
cdn-fastly-test.tile.openstreetmap.org 0
tile-openstreetmap-org.global.ssl.fastly.net 0

The 0 values are below 0.005 TPS. The last two domains were test domains that might still be cached in some users. There’s more traffic on a.tile.openstreetmap.org than b or c because sometimes people hard-code only one domain.

QGIS

QGIS is one of the major users of the standard tile layer, and we can get a breakdown of versions

version TPS
31800 7.23
31700 2.58
31604 48.73
31603 13.43
31602 3.76
31601 4.71
31600 4.52
31416 4.26
31415 17.13
31401 0.91
31400 1.99
31203 1.91
31202 4.43
31201 3.03
31200 4.63
31014 12.49
31013 1.83
31012 1.66
31011 2.04
31010 3.43
31009 1.04
31008 0.81
31007 1.89
31006 3.35
31005 2.6
31004 6.07
31003 1.88
31002 2.02

Versions before 3.10 used a different format in their user-agent, so I decided to cut the chart off there. Earlier versions contributed 38.54 TPS.

I’ve been doing some log analysis on requests to the OSMF-hosted standard tile layer on tile.openstreetmap.org. To do this I downloaded two hours worth of logs and loaded them into PostgreSQL to run some queries. The logs start at ###, and total 11GB compressed starting at 1600 UTC on 2021-02-25.

My main concern has been backend server load, so to analyze that I looked at cache misses - requests where the cache has to request a tile from the OSMF-operated backend servers. Typically the number of tiles requested is going to be five to ten times higher than the number of misses, but it will vary by zoom.

Total cache misses were 3437.4 per second.

The top five referers, as well as some interesting ones are

Referer domain Cache misses per second
None 1254.3
www.openstreetmap.org 418.7
www.openrailwaymap.org 16.0
apps.sentinel-hub.com 15.0
m.turkiye.gov.tr 13.1
localhost, on various ports 30.7
10.* IPs 14.2
Other 1675.4

The top sites vary by time and what parts of the world are awake, but most of the traffic is from the long tail of small sites, OpenStreetMap itself, or an app which should be sending a custom user-agent instead of a website with a referer.

For user-agents, I grouped different versions of some apps together. Like before, I’ve got some of the top ones, then a few interesting ones.

User-Agent Cache misses per second
MapProxy, all versions 281.8
QGIS, all versions 69.9
Fake FF 84 46.8
Marble, all versions 34.3
ArcGIS Client Using WinInet 32.8
StreetView, all versions 29.6
com.caynax.sportstracker, all versions 24.7
Maperitive, all versions 24.2
JOSM, all versions 22.9
Fake Chrome 25 22.2
Amazon CloudFront 13.5
OruxMaps, all versions 12.4
Fake FF 77 11.7
173A220003203F293A2E3C2A 10.7
cgeo 5.9
Other 597.1

The fake user-agents are in the process of being blocked now.

Like with sites, the long tail is a significant portion of the load. Substantial chunks come from OSM-related apps, FOSS geo-related apps (QGIS, Marble, cgeo), and the biggest source is caching proxies like MapProxy and Amazon CloudFront.

Overall, the usage is

Source Cache misses per second
OpenStreetMap website 418.7
Caching proxies 295.3
Other geospatial apps 91.3
Fakes 91.4
QGIS 69.9
OSM editing apps 52.5
Internal and testing IPs 44.9
Other websites 1719.5
Other apps 653.9
Location: 0.000, 0.000

OpenStreetMap Survey by visits

Posted by pnorman on 21 February 2021 in English (English).

In my last post I looked at survey responses by country and their correlation with mappers eligible for a fee waver as an active contributor.

I wanted to look at the correlation with OSM.org views. I already had a full day’s worth of logs on tile.openstreetmap.org accesses, so I filtered them for requests from www.openstreetmap.org and got a per-country count. This is from December 29th, 2020. Ideally it would be from a complete week, and not a holiday, but this is the data I had downloaded.

Preview image

The big outlier is Italy. It has more visits than I would expect, so I wonder if the holiday had an influence. Like before, the US is overrepresented in the results, Russia and Poland are underrepresented, and Germany is about average.

Like before, I made a graph of the smaller countries.

Preview image

More small countries are above the average line - probably an influence of Italy being so low.

OSMF survey country results

Posted by pnorman on 17 February 2021 in English (English).

The board has started releasing results from their 2021 survey. I’ve done some analysis on the response rates by country.

There’s lots of data for activity on OSM by country, but for this I took the numbers from joost for how many “active contributors” there are according to the contributor fee waver criteria.

Preview image

For the larger countries, Russia is the most underrepresented country. This is not surprising, as they are underrepresented in other venues like the OSMF membership.

The US and UK are both slightly overrepresented in the survey, but less so than I would have expected based on other surveys and OSMF membership.

The smaller countries are all crowded, so I did a graph of just them.

Preview image

As with other surveys, Japan is underrepresented. Indonesia, although underrepresented is less underrepresented than I would have expected.

OpenStreetMap Carto v5.3.1

Posted by pnorman on 5 February 2021 in English (English).

Dear all,

Today, v5.3.1 of the OpenStreetMap Carto stylesheet (the default stylesheet on the OSM website) has been released. There are no visual changes in this release.

Changes include - Natural Earth URL changed to directly point at the NACIS CDN - Added an option to the external data loader to grant SELECT permissions on the tables

For a full list of commits, see https://github.com/gravitystorm/openstreetmap-carto/compare/v5.3.0…v5.3.1

As always, we welcome any bug reports at https://github.com/gravitystorm/openstreetmap-carto/issues

Dear all,

Today, v5.3.0 of the OpenStreetMap Carto stylesheet (the default stylesheet on the OSM website) has been released. Once changes are deployed on the openstreetmap.org it will take a few days before all tiles show the new rendering. It may take longer than normal because there are significant deployment-related changes.

  • External shapefiles for coastline and other data are now loaded into the database with a provided script.

  • The recommended indexes are now required. Attempting to render without them will result in abysmal performance.

  • amenity=embassy is no longer rendered, and office=diplomatic with diplomatic=embassy or diplomatic=consulate is instead.

  • Mini-roundabouts are rendered like a turning circle.

  • There is a new partial index for waterways

Anyone running their own install must run scripts/get-external-data.py and create the new indexes. People who are running with minutely diffs may be interested in https://github.com/openstreetmap/chef/issues/386.

Thanks to all the contributors for this release, including hiddewie, crimsondusk, pitdicker, and terminaldweller, new contributors.

For a full list of commits, see https://github.com/gravitystorm/openstreetmap-carto/compare/v5.2.0…v5.3.0

As always, we welcome any bug reports at https://github.com/gravitystorm/openstreetmap-carto/issues

A common task with OpenStreetMap data in PostGIS is to convert polygons to points to place labels. For simple polygons, the centroid can be used, but some shapes like C-shaped polygons, the centroid can lie outside the polygon, so ST_PointOnSurface is used. This function guarantees the point returned is within the polygon.

The only issue with ST_PointOnSurface is that it throws an exception on some invalid geometries. This isn’t a problem with a database created by a recent version of osm2pgsql which only creates valid geometries, but for older versions or other data loaders it’s unacceptable. This has lead people to writing wrapper functions that check the validity or catch the exceptions, but I’ve seen no benchmarking of the various options.

To benchmark the options, I loaded the planet data from 2020-10-12 and looked at named water polygons - those that matched ("natural" = 'water' OR waterway = 'riverbank') AND name IS NOT NULL. To make the system better reflect a tile server under load, I set max_parallel_workers_per_gather to 0 and jit to off. I then ran the query EXPLAIN ANALYZE SELECT function(way) FROM planet_osm_polygon WHERE ("natural" = 'water' OR waterway = 'riverbank') AND name IS NOT NULL;.

I tested with ST_Centroid, ST_PointOnSurface, ToPoint from postgis-vt-util, a function that checked validity before calling ST_PointOnSurface, a function that caught the exception from invalid geometries, and a function that used ST_Centroid for polygons with 4 corners and ST_PointOnSurface otherwise. The definitions are at the end of this post.

Function Time
ST_Centroid 277s
ST_PointOnSurface 408s
ToPoint 575s
point1 568s
point2 409s
point3 409s

Parallelism

I set max_parallel_workers_per_gather, but my test server has a lot of CPU cores. If I increased this value I was easily able to saturate my SSDs, and all queries took the same time. Still, even if you’re IO limited it’s a good idea to minimize CPU.

Conclusions

If you have a database with potentially invalid polygons, you should use a wrapper function that catches the exception rather than checks validity first. Although ST_Centroid is faster than ST_PointOnSurface, it’s not worth trying to use it in simple cases.

Function definitions

CREATE OR REPLACE FUNCTION public.topoint(g geometry)
RETURNS geometry
LANGUAGE plpgsql
IMMUTABLE PARALLEL SAFE
AS $function$
begin
    g := ST_MakeValid(g);
    if GeometryType(g) = 'POINT' then
        return g;
    elsif ST_IsEmpty(g) then
        -- This should not be necessary with Geos >= 3.3.7, but we're getting
        -- mystery MultiPoint objects from ST_MakeValid (or somewhere) when
        -- empty objects are input.
        return null;
    elsif (GeometryType(g) = 'POLYGON' OR GeometryType(g) = 'MULTIPOLYGON') and ST_NPoints(g) <= 5 then
        -- For simple polygons the centroid is good enough for label placement
        return ST_Centroid(g);
    else
        return ST_PointOnSurface(g);
    end if;
end;
$function$


CREATE OR REPLACE FUNCTION public.point1(g geometry)
RETURNS geometry
LANGUAGE sql
IMMUTABLE PARALLEL SAFE
AS $function$
SELECT CASE WHEN ST_IsValid(g) THEN ST_PointOnSurface(g) END;
$function$


CREATE OR REPLACE FUNCTION public.point2(g geometry)
RETURNS geometry
LANGUAGE plpgsql
IMMUTABLE PARALLEL SAFE
AS $function$
BEGIN
RETURN ST_PointOnSurface(g);
EXCEPTION WHEN OTHERS THEN
RETURN NULL;
END
$function$


CREATE OR REPLACE FUNCTION public.point3(g geometry)
RETURNS geometry
LANGUAGE plpgsql
IMMUTABLE PARALLEL SAFE
AS $function$
BEGIN
RETURN CASE WHEN ST_NPoints(g) <= 5 THEN ST_Centroid(g) ELSE ST_PointOnSurface(g) END;
EXCEPTION WHEN OTHERS THEN
RETURN NULL;
END
$function$

Cross-posted from my blog

I’ve been working on a new project, OpenStreetMap Cartographic. This is a client-side rendering based on OpenStreetMap Carto. This is an ambitious project, as OpenStreetMap Carto is an extremely complex style which shows a large number of features. The technical choices I’m making are designed so the style is capable of handling the load of osm.org with minutely updates.

I’ve put up a world-wide demo at https://pnorman.dev.openstreetmap.org/cartographic/mapbox-gl.html, using data from 2020-03-16, and you can view the code at https://github.com/pnorman/openstreetmap-cartographic.

Preview image

Incomplete parts

Only zoom 0 to 8 has been implemented so far. I started at zoom 0 and am working my way down.

Admin boundaries are not implemented. OpenStreetMap Carto uses Mapnik-specific tricks to deduplicate the rendering of these. I know how I can do this, but it requires the changes I intend to make with the flex backend.

Landuse, vegetation, and other natural features are not rendered until zoom 7. This is the scale of OpenStreetMap Carto zoom 8, and these features first appear at zoom 5. There are numerous problems with unprocessed OpenStreetMap data at these scales. OpenStreetMap Carto gets a result that looks acceptable but is poor at conveying information by tweaking Mapnik image rasterizing options. I’m looking for better options here involving preprocessed data, but haven’t found any.

I’m still investigating how to best distribute sprites.

Technology

The technology choices are designed to be suitable for a replacement for tile.osm.org. This means minutely updates, high traffic, high reliability, and multiple servers. Tilekiln, the vector tile generator, supports all of these. It’s designed to better share the rendering results among multiple servers, a significant flaw with renderd + mod_tile and the standard filesystem storage. It uses PostGIS’ ST_AsMVT, which is very fast with PostGIS 3.0. On my home system generates z0-z8 in under 40 minutes.

Often forgotten is the development requirements. The style needs to support multiple developers working on similar areas, git merge conflicts while maintaining an easy development workflow. I’m still figuring this out. Mapbox GL styles are written in JSON and most of the tools overwrite any formatting. This means there’s no way to add comments to lines of codes. Comments are a requirement for a style like this, so I’m investigating minimal pre-processing options. The downside to this will make it harder to use with existing GUI editors like Fresco or Maputnik.

Cartography

The goal of this project isn’t to do big cartography changes yet, but client-side rendering opens up new tools. The biggest immediate change is zoom is continuous, no longer an integer or fixed value. This means parameters like sizes can smoothly change as you zoom in and out, specified by their start and end size instead of having to specify each zoom.

Want to help?

Have a look at https://github.com/pnorman/openstreetmap-cartographic and have a go at setting it up and generating your own map. If you have issues, open an issue or pull request. Or, because OpenStreetMap Cartographic uses Tilekiln have a look at its issue list.