OSM datasize in PostGIS

Posted by jremillard on 31 May 2013 in English (English)

I have been playing with osmosis/postGIS this week. I was finally was able to import the MA OSM extract today using osmosis.

It took a lot longer than using osm2pgsql and I was very surprised at how large it got when it was imported.

massachusetts-latest.osm.pbf - 205 Megabytes massachusetts-latest.osm - 5 Gigabytes - 25 time (it is an XML file) PostGIS snapshot db - 10 Gigabytes - ???

It looks like PostGIS/osmosis is not noticing that we repeat the same tags, over and over, and over again in the data.

Comment from SK53 on 31 May 2013 at 10:26

In my experience its often faster to use osmosis to write a pgsql dump & just use copy to do the import (--pgd option in osmosis).

I think you'll find most of the storage overhead is in the geometry columns

Comment from jremillard on 31 May 2013 at 14:07

OK, I will try that!

Comment from pnorman on 5 June 2013 at 08:41

The big difference between osm2pgsql and pgsnapshot is osm2pgsql is lossy, so can disregard most of the tags.

The other tips are

  • Use --write-pgsql-dump and if you want geometry columns build them with osmosis. It's a more manual process but it is way faster.
  • If you need geometry columns and have the RAM, give java 32GB of heap space, otherwise put the node location store temp files on a SSD
  • When building indexes, omit any indexes you're not planning to use. If you're just performing tag analysis, you could get away with no geometry indexes, and the nodes geom index takes more time then loading the data.

For reference, creating and loading the dump files takes 10h51m on my home dev server and with decent sequential disk speed is CPU bound if you have in memory node store.

The --read-pbf-fast option with as many workers as CPU cores may help a bit here.

Login to leave a comment