OpenStreetMap logo OpenStreetMap

Vector Tile File Formats

Posted by spwoodcock on 18 September 2023 in English.

Storing map tiles in a single file is a common way to load basemaps on a map client.

There are a few formats available to do this, with different use cases.

Offline

mbtiles

  • A format innovated by Mapbox, but is a fully open spec.
  • Essentially an SQLite database linking to embedded tiled images.
  • The client interfaces with the database and loads each tile as required by the basemap.

OSMAnd SQLite

  • Based on BigPlanet SQLite format.
  • Basically the same as mbtiles, but a slightly different database schema.

A small aside.

Sometimes it’s necessary to generate both mbtiles and OsmAnd format to view in different software, which is a pain.

There is an open issue in OsmAnd to support mbtiles format, but it’s not a priority for now.

Knowing that they are very similar file formats, I considered the possibility of accessing one SQLite database via another ‘wrapper’ SQLite database in a custom view. This view would map tables and fields from one database schema to the other, eliminating the need to store both tilesets for the same data.

Assuming you have an MBTiles table with the following schema: CREATE TABLE mbtiles_table ( zoom_level INTEGER, tile_column INTEGER, tile_row INTEGER, tile_data BLOB );

And you want to create a view for an OsmAnd SQLite table with a schema like: CREATE TABLE osmand_table ( _id INTEGER PRIMARY KEY AUTOINCREMENT, x INTEGER, y INTEGER, z INTEGER, tile_data BLOB );

You can create a view to convert between them like this: CREATE VIEW osmand_mbtiles_view AS SELECT NULL AS _id, -- Use NULL for auto-increment _id tile_column AS x, tile_row AS y, zoom_level AS z, tile_data FROM mbtiles_table;

I couldn’t get this to work when testing, however (it may warrant further investigation).

If you find a solution, please do let me know!

Online

PMTiles

  • A neat new format specifically aimed at cloud-optimising vector tile access (accessing an mbtile file over the web is very inefficient, full details).
  • Easily handles both large planet-scale datasets with millions of tiles and small-scale datasets. As a single file it is perfect for S3 object storage.
  • Uses HTTP RANGE requests to only download the tiles specified in a BBOX (not the entire file).
  • Compression, tile deduplication (no need to repeat that blue ocean tile…), an optimised internal structure to minimise size and number of requests when panning or zooming, and minimal overhead when requesting tiles (tiny initial request).
  • For public deployments it is recommended to run behind a CDN to both cache tile requests, and act as a proxy to a private S3 bucket (anonymous direct file download from S3 may incur large costs).

So What Should I Choose?

If the layer (likely a basemap) needs to work offline, then:

  • SQLite if the tool/app supports it, e.g. OSMAnd.
  • mbtiles for tools that require it, such as ODKCollect.

If working online, then PMTiles may be best:

  • A replacement for XYZ basemap tile servers (great for reducing load on the OpenStreetMap servers 🙏).
  • Creating custom basemaps from imagery and/or OSM exports.

Converting Between Formats

mbtiles –> OSMAnd

  • This is relatively easy due to both being SQLite files at core.
  • The excellent Python utility by @tarwirdur mbtiles2osmand does this quite efficiently.

python mbtiles2osmand.py INPUT.mbtiles OUTPUT.sqlite3

  • I also ported this to Golang mb2osm, but have some work to do on improving performance. Feel free to contribute!

The advantage of using Golang here is to produce a statically compiled binary. This means that the single file does not require any external dependencies, or interpreter to run (unlike Python), making it more portable.

mbtiles –> pmtiles

  • The best choice for this would be go-pmtiles, by the creator of PMTiles.
  • Again, a single file binary program that can convert in one command.

pmtiles convert INPUT.mbtiles OUTPUT.pmtiles

Other formats –> pmtiles

  • In cases where you have other formats to convert first, e.g. directly from a database, GeoJSON, etc, tippecanoe (>v2.17) is recommended tool.
  • The official example:

ogr2ogr -t_srs EPSG:4326 cb_2018_us_zcta510_500k.json cb_2018_us_zcta510_500k.shp

Creates a layer in the vector tiles named “zcta”

tippecanoe -zg --projection=EPSG:4326 -o cb_2018_us_zcta510_500k_nolimit.pmtiles -l zcta cb_2018_us_zcta510_500k.json

Location: Kailash Chok, Lazimpat, Kathmandu-02, Kathmandu, Kathmandu Metropolitan City, Kathmandu, Bagmati Province, 21255, Nepal

Discussion

Comment from bryceco on 20 September 2023 at 07:54

I saw Brandon’s presentation on pmtiles at SOTM US in Richmond but didn’t really grok the significance. I now understand better what many of the advantages are, but I don’t know enough about cloud services to understand the big reduction in cost. Is it because AWS charges based on number of files so by putting everything in a single file (and use range requests) you can hack the usage fees? I’d really appreciate a more thorough explanation.

Comment from spwoodcock on 20 September 2023 at 12:56

Pretty much! Although the main cost reduction is in the number of requests that need to be made.

Imagine you store your tiles in a typical directory structure of a TMS / XYZ server. Each tile is an individual file, and a cost is incurred when requesting each tile.

If you store all the files in a neatly packaged single file (PMTiles), then your costs reduce significantly. Only a couple of requests are made:

  • The file is queried for metadata to find which range of bytes you need (i.e. which tiles).
  • Then you download only the specific tiles you need to display, in a single RANGE request.

(I may have oversimplied slightly, but that’s pretty much the concept).

Thanks for reading!

Comment from spwoodcock on 20 September 2023 at 13:00

Also, to further reduce your costs, you can follow the guide to deploy behind a CDN for better tile caching: https://protomaps.com/docs/cdn

Comment from bryceco on 20 September 2023 at 16:40

The piece I was missing is that a single range request can ask for multiple non-adjacent ranges, so you can request an arbitrary number of non-sequential tiles in a single request. Thanks for the explanation.

Log in to leave a comment