I have been playing with osmosis/postGIS this week. I was finally was able to import the MA OSM extract today using osmosis.
It took a lot longer than using osm2pgsql and I was very surprised at how large it got when it was imported.
massachusetts-latest.osm.pbf - 205 Megabytes massachusetts-latest.osm - 5 Gigabytes - 25 time (it is an XML file) PostGIS snapshot db - 10 Gigabytes - ???
It looks like PostGIS/osmosis is not noticing that we repeat the same tags, over and over, and over again in the data.
Discussion
Comment from SK53 on 31 May 2013 at 10:26
In my experience its often faster to use osmosis to write a pgsql dump & just use copy to do the import (–pgd option in osmosis).
I think you’ll find most of the storage overhead is in the geometry columns
Comment from jremillard on 31 May 2013 at 14:07
OK, I will try that!
Comment from pnorman on 5 June 2013 at 08:41
The big difference between osm2pgsql and pgsnapshot is osm2pgsql is lossy, so can disregard most of the tags.
The other tips are
For reference, creating and loading the dump files takes 10h51m on my home dev server and with decent sequential disk speed is CPU bound if you have in memory node store.
The
--read-pbf-fast
option with as many workers as CPU cores may help a bit here.