OpenStreetMap

BloatedStreetMap

Posted by alexkemp on 12 June 2019 in English (English)

Has OpenStreetMap (OSM) become BloatedStreetMap (BSM)?

“I’m much too big to let the world see me”

That is the view of the BSM Admin:

“crawling billions of objects puts (too much load) on our servers”

I do not agree. Certainly setup changes could help reduce current load/bandwidth, but that is not the actual problem.

First & foremost I believe it to be a question of attitude rather than a technical problem.

If you really want the whole world to be able to access the map then you will find a way to do that. You will make it happen. I also happen to be in a position to know that it is possible to be able to open access to all, and can prove it, but even so — it is attitude first, and if you do not have the right attitude and are unwilling to change then, with all respect, I will suggest that you find another job.

Time for me to make real some of the statements above. Consider the following (apologies in advance, some of this is very geeky):–

  1. It is possible to be able to open access to all
     
    BSM & SFS (StopForumSpam) handle a similar amount of traffic yet BSM has 90 times the number of servers that SFS has. Think about that for a few moments. Are you beginning to understand why the topic for this post is “BloatedStreetMap”?
     
    SFS allows absolutely anyone to access an API (an interface designed for bots to auto-query, auto-post + auto-return information) without registration. It has been doing that for 10 years (52 billion queries) upon a single server made available for free by a hosting company. So yes, SFS is serving billions of queries with ease to anyone that interrogates it 24/7/365 worldwide, most of whom are bots! At the same time it has a forum for registered users to provide help & receive feedback.
     
    Now please remember that spam is one of the worst problems being suffered by every server on the planet, and yet SFS is able to store & supply spammer details submitted by hundreds of thousands of servers. Yet, BSM has 90 servers and SFS has just one. What is the difference? It is called attitude.
     
  2. The BSM server-setup Bloats the Bandwidth rather than Reduces It
     
    I’ll keep this as geek-less as possible.
     
    The current BSM Server setup produces pages that inflate the bandwidth and as such increase the server load. In part that is due to (what I consider to be) setup-errors within the BSM Content Negotiation. First, a quick explanation of Content Negotiation
     
    Content Negotiation for Dummies
    The simplest way to understand this is to consider what happens when you browse through a series of web-pages from the same server. As you request each new page your browser will request that page from the server. However, some things like pictures, logos, etc. (and the CSS + script) may have already been obtained and therefore not need re-requesting. Finally, you may at some point use the Back-button to look at a page that you have already obtained, in which case there should be no need to re-request anything on that page from the server. The mechanism that allows all this to work is called Content Negotiation.
     
    The point for the user is that Content Negotiation makes for a fast experience within the browser, and the point for the server is that it lowers bandwidth & therefore reduces server load a lot. Unfortunately, BSM has setup dynamic pages — like the Map and these Diary pages — so that many of the bandwidth-saving features are switched off. Here are the main ones:
     
    Server Setup-Errors — no Last-Modified, weak ETag + wrong Cache-Control
     
    If you look at the Response Headers for one of the Map pages (below; see here if the BSM admin have removed it) you will see that there is both an ETag header plus a “must-revalidate” Cache-Control header. The latter says effectively “do not cache” whilst the former relies on a cache to work. Having both in the same Response is nonsense and means that bandwidth will constantly be bloated since any pages already in the browser cache will never be used.
     
    The lack of a Last-Modified header is also problematic. If you look at section 13.3.4 of the HTTP/1.1 rfc you will see the following phrase:–

    13.3.4 Rules for When to Use Entity Tags and Last-Modified Dates

    In other words, the preferred behavior for an HTTP/1.1 origin server is to send both a strong entity tag and a Last-Modified value.

    And it gets worse. So far I’ve been talking about dynamic HTML pages (static files such as robots.txt (below) are supplied by Apache and have full & proper headers). However, other static files have Response headers that are NOT directly under Apache control and suffer the same bandwidth bloat:
     
    Server Setup-Errors — page Static files have the same errors
     
    I loaded the URL below in FireFox 60.7.0esr (64-bit) (the following pattern is identical with Chrome), loaded up Web developer tools, switched to the Network tab + pressed Refresh. In an ideal world, every element on the page should have been loaded from cache, since the page had only come from the server seconds before. In fact, EVERY element on the page was re-loaded from the BSM server, even all so-called static files. Here are some important features:
     
    • A browser Request header: If-None-Match W/"99fe0dc7d5ac6370b3fa12b2c3ce255e-gzip"
    • A server Response header: etag W/"27e7819c5ad740808181b4863f9310d4-gzip"
      (they do not match, so the page is re-sent, even though both are actually identical)
    • Static files: Neither Last-Modified nor ETag are supplied by the server for any static file, and a Cache-Control max-age=0 Request header from the Browser therefore causes the server to re-supply each file all over again

The above is so dumb, and so easy to fix.

These are selected Response headers for one of the map pages:-

$ wget -O /dev/null -S https://www.osm.org/way/17236956 
--2019-06-11 19:15:05-- https://www.osm.org/way/17236956 
HTTP request sent, awaiting response... 
HTTP/1.1 200 OK 
Date: Tue, 11 Jun 2019 18:15:05 GMT 
Server: Apache/2.4.29 (Ubuntu) 
Cache-Control: max-age=0, private, must-revalidate 
Content-Encoding: gzip
Vary: Accept-Language,Accept-Encoding 
Content-Language: en 
ETag: W/"6b6023e8efd4983110d4097ec2ee5e56" 
Status: 200 OK 
Keep-Alive: timeout=5, max=100 
Connection: Keep-Alive 
Transfer-Encoding: chunked 
Content-Type: text/html; charset=utf-8 
Length: unspecified [text/html] 
Saving to: ‘/dev/null’ 

…and a static file under Apache control, which basically makes the point that Apache always sends both HTTP/1.1 + HTTP/1.0 headers:

$ wget -O /dev/null -S https://www.osm.org/robots.txt 
HTTP request sent, awaiting response... 
HTTP/1.1 200 OK 
Date: Tue, 11 Jun 2019 18:29:58 GMT 
Server: Apache/2.4.29 (Ubuntu) 
Last-Modified: Thu, 06 Jun 2019 17:26:06 GMT 
ETag: "184-58aab0255d051" 
Accept-Ranges: bytes 
Content-Length: 388 
Vary: Accept-Encoding 
Keep-Alive: timeout=5, max=100 
Content-Type: text/plain; charset=utf-8

Update June 16, 2019:
Having rechecked the same page as before prior to making a GitHub Issue the cache situation is changed, so here is an update:

Files that are cached & receive a 304 Not Modified on f5 Refresh:
10 x PNG files; eg https://b.tile.openstreetmap.org/18/130245/85447.png
all are (example Response header):
cache-control: max-age=210416
etag: “8b71a4849e1133bdb17816c870b7a54a”
expires: Wed, 19 Jun 2019 00:13:38 GMT
via toothless.openstreetmap.org
Notes:
ETags are strong + match & therefore the file gets a 304, even though Request Header is Cache-Control: max-age=0; still no Last-Modified supplied

Files that receive a 200 GET (re-supply, no cache) on f5 Refresh:
1 x Main HTML document eg https://www.openstreetmap.org/way/17236956
cache-control: max-age=0, private, must-revalidate
Notes:
Weak ETag does not match; neither Last-Modified nor Expires supplied.

3 x CSS eg https://www.openstreetmap.org/assets/screen-ltr-….css
1 x GIF eg https://www.openstreetmap.org/assets/searching-….gif
3 x JS eg https://www.openstreetmap.org/assets/application-….js
7 x PNG eg https://www.openstreetmap.org/assets/directions-….png
2 x SVG eg https://www.openstreetmap.org/assets/osm_logo-….svg
all are (Response header):
expires: Mon, 15 Jun 2020 13:46:41 GMT
Notes:
Neither Last-Modified nor ETag supplied, hence no 304

1 x JS eg https://piwik.openstreetmap.org/piwik.js
Notes:
Both Last-Modified + strong ETag are supplied and match the Response Header, but neither Cache-Control nor Expires are supplied, hence no 304

1 x XML eg https://www.openstreetmap.org/api/0.6/way/17236956/full
cache-control: max-age=0, private, must-revalidate
Notes:
Neither Last-Modified nor ETag nor Expires supplied, hence no 304

2 x GIF eg https://piwik.openstreetmap.org/piwik.php?action_name=… (a 1x1 pixel)
Cache-Control: no-store
Notes:
Neither Last-Modified nor ETag nor Expires supplied, hence no 304

Comment from iandees on 12 June 2019 at 16:04

Just so you know, the relatively good points you’re making are lost in the unneeded attacks on our project and volunteer administrators. You can make the same arguments much more effectively by skipping the references to “bloated” and name-calling.

Please try to make your points more constructively and less passive-aggressively or stop posting here.

Comment from alexkemp on 12 June 2019 at 17:00

This may surprise you - they are not attacks. They are observations on the situation together with suggestions for fixing errors and/or improving the situation. The fact that you receive it as an ‘attack’ says a great deal about you and not so much about me.

The article is only part-written (fyi). More to come.

Oh! Stop telling me what to do.

Comment from iandees on 12 June 2019 at 17:10

I’m not telling you what to do. I’m suggesting that people might pay more attention to you if you aren’t so crass.

Comment from mmd on 12 June 2019 at 17:55

Something seems to be wrong with your analysis.. www.osm.com has nothing to do with OpenStreetMap, it redirects to some producer of engineered steel products.

Comment from alexkemp on 12 June 2019 at 19:58

@mmd
Many thanks, mmd, my bad. Wrong URL now fixed.

I originally copied the Response headers from my machine to here, then from here to WMW. WMW asked me to anonymise the urls so they all became ‘example.com’. BSM admin then removed the code from here whilst they are hidden from the SEs so I copied them from WMW to here + re-edited the URLs but left them as .com. Oops.

Comment from alexkemp on 13 June 2019 at 03:32

@iandees

I’m not telling you what to do. I’m suggesting that people might pay more attention to you…

Uh huh.

stop posting here

I call that telling me what to do. I stopped paying attention to you at that point.

Login to leave a comment