OpenStreetMap

WOF#4. Database server load

Posted by WorstFixer on 14 May 2012 in English.

I want no offence. Forgive me if I speak wrong. I state my mind in best words I find. If I fail to find words please help me. Not say “go away and stop critics”.

I try to fix worst things. First I thinked they were tags. Trying to fix tags I found larger issue. I will no more fix tags without approval. I got idea of “Caring community preview” proposed by user Matt. I want not my posts to be read as “justification for unaccepted edits”. I want my edits accepted. But when unacceptance reasons are technical, I want fix technical reasons, not get “no do not just because”.

User woodpeck said “Ten guys like WorstFixer and we can fill a separate $15000 database server just with the likes of him”. Think: ten users can kill $15000 server.

I respect system administrators. They do good job.

But looks we have worse problem than database bloat. Database server IO load. I read wiki and munin. Here is how it looks it works.

Ramoth server is upset

Ramoth is main OSM database server. It is not cheap. But it is upset. And users are angry about it.

OSM Database server not calculates. It stores, reads and writes data. So it needs not to have lots of CPU. CPU is not used. Look:

Ramoth CPU usage

It uses 2 cores at most. And look on top of graph. It always waits for IO. All it can do is wait IO. And do nothing. And be upset.

Here is how it waits for data:

SDB IOs

Midnight spikes are some database cron jobs. I think they show top of possible performance.

Slowness comes if it needs to read faster. And it almost reaches top load when usual people edit. Not even WorstFixer.

Here is list of things I propose:

1. $0. Calculate statistics.

I want daily update of database table sizes. So people can really see data base bloat. And maybe start fixing code if it is really issue.

To do this, system administrators need add results of this query to http://www.openstreetmap.org/stats/data_stats.html page.

SELECT nspname || ‘.’ || relname AS “relation”, pg_size_pretty(pg_total_relation_size(C.oid)) AS “total_size” FROM pg_class C LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) WHERE nspname NOT IN (‘pg_catalog’, ‘information_schema’) AND C.relkind <> ‘i’ AND nspname !~ ‘^pg_toast’ ORDER BY pg_total_relation_size(C.oid) DESC;

2. Move all frequently accessed data to SSD.

There are already SSD in server. But less than half of it is really used. It is used to speed up read of some regions of disk. Needs checking what it actually stores. I afraid it stores GPS Points for Germany and not map data.

2a. $2000. 2xIntel® SSD 320 Series 600GB.

Decide which tables can fit into 600 GB array of two mirroring SSDs. Move all index to it. Preferably move all current_ tables if fits. Needs no rails code change. Just database table spaces.

2b. $1000. 1xIntel® SSD 320 Series 600GB.

Reorgznize disks. Use only 1 SSD disk for CacheCade. Use two other as in 2a.

2c. $600. 1xCrucial 512 GB SSD.

Reorgznize disks. Use only 1 SSD disk for CacheCade. Use 512 GB disk for that. CacheCade not uses more anyway. Use two other as in 2a.

If you like this proposals, you can donate to openstreetmap. http://donate.openstreetmap.org/ is easy. If OSM has money specially marked “for disks” it will surely buy better disks.

Are my calculations flawed? I want comment from OSM administrators, if possible. And any other opinion.

Discussion

Comment from Sanderd17 on 14 May 2012 at 13:21

You don’t only need to look at the cost/GB, you also need to look at the durability.

SSD can only accept a limited number of writes. So it’s good to put your OS on, because you don’t re-write your OS constantly. But the database of OSM is constantly edited, and if you want to keep the visible data on that disk, you need to take all those writes into account.

I think (from the top of my head), for a database, some sort of RAID, together with a compressed file system is best. The RAID is pure speed gain over normal configuration (but RAID is already used). And the compression will take load off IO (because the compressed data is smaller), while the CPU won’t have difficulties handling compression to save, or decompression to get the data (that CPU is cheaper anyway).

Comment from WorstFixer on 14 May 2012 at 13:39

Sanderd17,

That is exactly my point. I propose moving only current_ tables and index to SSD. In (unlikely) case of (both in RAID at same time) SSD failure it will be possible to reconstruct them from history tables.

SSD durability is thing to look at. If somebody proposes better SSD. But Intel 320 series is already used in OSM servers. I think nobody would have bought bad SSDs for OSM.

Compressing random access data is useless and will use up more memory.

Comment from Gnonthgol on 14 May 2012 at 17:10

I am not sure what problem you are trying to solve. The current database server is bloody fast, and there is a double being added on. There are no problems with disk io in the raid setup we have in ramoth.

I agree that some time in the future it might be better to have some sort of double db or cached db that you can get an even faster api when the datasizes becomes too much for our current setup to handle. However this is not an issue yet, and takes a lot of man houres to make.

I understand that you did not like that your account was banned when you did what you thought was right. I hope you now understand why your edits was bad and that you learn from your mistakes.

I agree with you that bad imports should be reverted as soon as possible to prevent edits upon those imports. It can be imported later on in a way that the community sees benifitial. It is hard to fix a bad import with more bots.

If you want to help out you can create better tools. In addition to good editors that can handle relations logicaly there are no good tools for:

  • detecting bad imports/bots
  • reverting theese changes
  • doing good imports

It would help a lot if you would help out with this.

Comment from jhp on 15 May 2012 at 01:51

@Gnonthgol… I’m relatively new to OSM so pardon a naive question. I’d suggest that if there are any observed db performance problems now, or future problems anticipated, it might be worthwhile to consider the schema, the db indicies, the structure of typical queries, and the structure of observably problematic queries.

Are there any links containing such information?

Comment from compdude on 15 May 2012 at 16:00

If I were an admin, I really wouldn’t trust SSDs to store frequently accessed data. While they are very fast, they have a limited number of read/write cycles before they go kaput without warning. That’s the only problem with SSD drives (and even those little USB flash drives).

Comment from Gnonthgol on 15 May 2012 at 20:08

@jhp The observed db problems are only observable if you look at stats over several months. The current setup will work without any problems for a long time, and that is before we have added more db servers. That was what I tryed to say in the first comment. There will be no problems in the forseable future.

You can get stats for all the servers at munin.osm.org

@compdude I still remember the dreaded messages on digital cameras with flash memmory cards ‘This card have not been formated yet. Do you want to format it?’ And all the holiday fotos was on that single flash card.

Flash memmory are read only memmory that you can write to as an exception. SSD have clever electronics that limits those exceptions and tryes to recover any lost data. However there is a limit to how well it can do this. The only SSD disk in the data centers are the one that holds the standard layer render db. It is mostly used for writing diffs and reading a lot of random access data, the best sides of SSDs. For the main db a 16x RAID10 is better than any SSD disk.

Comment from compdude on 18 May 2012 at 03:36

@Gnonthgol, I didn’t know that info about SSD drives! Learning something new every day… :)

Comment from WorstFixer on 18 May 2012 at 07:59

Gnonthgol,

I request calculations. I not believe in “that is just bad” things.

There are no problems with application servers I see. No problems with current setup for front end.

I observe database slowness. Some years ago uploading 1000 objects took minute. Now it takes 15.

Main database disk sub system is slow. It can not cope with amount of non-sequential data reads users ask.

I propose using SSD in RAID. Then wear-out will not lead to death of data. Also, wear-out happens not with SSDs manufactured lately.

You say “The only SSD disk in the data centers are the one that holds the standard layer render db. It is mostly used for writing diffs and reading a lot of random access data, the best sides of SSDs”.

That is exactly what it means. SSD contains visible database part. For rendering. Why not make same for all users?

Comment from scai on 18 May 2012 at 12:11

go away and stop.

Log in to leave a comment