OpenStreetMap

WOF#3. Database bloat hoax

Posted by WorstFixer on 13 May 2012 in English (English)

I was told every edit creates database bloat. Here some drawings.

This is simple image showing how data circulates in OSM:

How data circulates in OSM

Why grey revert? Reverting store is cheap. "Version 1 is same as version 3". You keep current version in current data base and just pointer in archive.

Look closer at archive and visible separation:

Save server organisation

It is sane way to do servers for OSM. You need not tell contributors not to contribute because of large database. Real users need archive part not.

If disk space is low: buy more! Ask for donations! Here is rough simple list:

Some OSM donations

I remind OpenStreetMap is easy to donate: http://donate.openstreetmap.org/

Comment from chriscf on 13 May 2012 at 17:03

Disk space is cheap-ish (not at cheap as it once was, but blame Seagate or WD for that). Don't really see why anyone would complain. (After all, if an edit is problematic, it can be reverted wholesale.)

For future reference, the correct procedure when you find bad data is to Just Fucking Fix It. If anyone gets in your way, remind them that while you admire their principles, they have a massive stick up their arse and they're getting in the way of you Just Fucking Fixing It. Corollary: writing long diary entries, creating lots of diagrams, and Rule 7 violations in general also detract from Just Fucking Fixing It. (As does writing lengthy responses in the comments.)

Hide this comment

Comment from robert on 13 May 2012 at 17:50

Oh hold on - is this all about you trying to justify doing bulk edits? Good luck with that.

Hide this comment

Comment from woodpeck on 13 May 2012 at 18:04

If one person believes they can, based solely on their own judgement that has clearly proven faulty in the past, and without any further discussion, "fix" 75,000 objects in one day - stuff that is not at all unlikely to require a later revert, therefore doubling the number of edits to 150,000 - then that increases the total editing activity on that one day by 5% to 10%. For that one person alone.

Not only does that unnecessarily increase the amount of data in our history tables (which is stored uncompressed in the same database and on the same disks as the current data, and where no such thing as a "simple pointer to an old version" exists, patches, as always, welcome); it also taxes our database server. Look at its disk utilization graph and ask yourself how many people you would like to have running bots on that while you wait for your bona fide upload to finish:

munin

Ten guys like WorstFixer and we can fill a separate $15000 database server just with the likes of him.

This doesn't even begin to touch the question if there's any merit in what WorstFixer thinks needs fixing.

And because mass edits have this potential of negatively affecting our systems and upsetting people, we require that they be discussed beforehand and a community consensus reached.

ChrisCF's statement about just fixing things is mostly ok for ordinary manual mapping activity (but even there it may happen that something that looks "bad" to you is ok for others). It is, however, not ok for large-scale automated changes.

I won't be writing any more comments here. This is ridiculous. WorstFixer, get your act together and participate in what the grown-ups do. Continuing to piss over everybody else's work won't earn you respect.

Hide this comment

Comment from woodpeck on 13 May 2012 at 18:05

If one person believes they can, based solely on their own judgement that has clearly proven faulty in the past, and without any further discussion, "fix" 75,000 objects in one day - stuff that is not at all unlikely to require a later revert, therefore doubling the number of edits to 150,000 - then that increases the total editing activity on that one day by 5% to 10%. For that one person alone.

Not only does that unnecessarily increase the amount of data in our history tables (which is stored uncompressed in the same database and on the same disks as the current data, and where no such thing as a "simple pointer to an old version" exists, patches, as always, welcome); it also taxes our database server. Look at its disk utilization graph and ask yourself how many people you would like to have running bots on that while you wait for your bona fide upload to finish:

munin

Ten guys like WorstFixer and we can fill a separate $15000 database server just with the likes of him.

This doesn't even begin to touch the question if there's any merit in what WorstFixer thinks needs fixing.

And because mass edits have this potential of negatively affecting our systems and upsetting people, we require that they be discussed beforehand and a community consensus reached.

ChrisCF's statement about just fixing things is mostly ok for ordinary manual mapping activity (but even there it may happen that something that looks "bad" to you is ok for others). It is, however, not ok for large-scale automated changes.

I won't be writing any more comments here. This is ridiculous. WorstFixer, get your act together and participate in what the grown-ups do. Continuing to piss over everybody else's work won't earn you respect.

Hide this comment

Comment from WorstFixer on 13 May 2012 at 19:33

Dear woodpeck,

I am sorry for upsetting you too much. I did this post to state my mind. Not to piss over your work.

I stopped uploading changes for now. You can see that. I preparing letter to talk@ list with edits I want to do.

I not understand some rules.

I understand my first block. You found bad tags in my edits. You removed my bad tags and replaced them with previous bad tags. That is fine. At least understandable.

You said "contact affected persons". I did. I got "yes, please edit". I limited my change only to edits by that person. I stated clearly that in change set comment. You ban me. That is less understandable. But you have ban hammer and set rules here.

I am afraid of that "find consensus" thing. I see no way of understanding for sure if consensus is set.

I was trying to clearly state my mind on "database bloating" before posting to mailing list. I want to answer this argument before someone else uses it.

How can I be sure that all the people in list say "yes, upload", I start upload, someone says "no not upload you bloat database" and you ban me again in the middle of upload?

Hide this comment

Comment from compdude on 14 May 2012 at 03:03

OSM claims to be like Wikipedia, except with maps. Can we all adapt wikipedia's user guidelines then? Especially consider "assume good faith" meaning assume that the person is trying to suggest how to improve OSM and is not simply complaining. Please stop giving people crap about complaining and whining about OSM when they are genuinely trying to suggest how to improve the project. Instead, listen to their suggestions and IMPLEMENT THEM!!!!!

To make sure this gets implemented I am going to email someone on OSMF and make sure they take a look at this.

Hide this comment

Comment from Harry Wood on 14 May 2012 at 10:10

@compdude. Yes. Why don't we look to what wikipedia does: "Operation of unapproved bots, or use of approved bots in unapproved ways outside their conditions of operation, is prohibited". Wikipedia has rules like this because... oh look. All the same reasons woodpeck was stating in relation to OpenStreetMap. "Assume good faith" does not mean wikipedia lets everyone romp around screwing things up with automated edits

And while we're talking about assuming things, how about assuming that Frederik and others know what they're talking about when it comes to the technologies and data structures behind OpenStreetMap. I don't think there's anything wrong with questioning things. And actually the diagram at the top here is rather good. Perhaps it could help to frame an interesting discussion about how OSM could more efficiently store historical data.

Unfortunately this feels like it is posted in the spirit a know-it-all justification for poor behaviour. If you think all these problems are solved with more disk space and "disk space is cheap", well that all sounds wonderfully simple. Maybe you should go set up your own OpenStreetMap to show us all how it's done. You could allow yourself to run lots of bot edits on it. Good luck with that. In the meantime you would be welcome to raise constructive questions while showing some humility and politeness towards those who designed and built the system.

Hide this comment

Comment from compdude on 14 May 2012 at 23:05

I'm sorry if I didn't show some respect to you, but please understand that 100% of human beings can get frustrated quite frequently. And such frustration can result in not respecting others. If we developed a more respectful and welcoming atmosphere in OSM, that would be a wonderful thing.

About your mention of Wikipedia's of bot approval, this is very sensible (why wouldn't they have some sort of approval system for bots?) but the rules on OSM seem to not be laid out really well (unlike wikipedia) and as a result rules can be unclear, and people may not understand why they got blocked (like this person). Also, where could WorstFixer have gotten approval for his bot/ script? (If your answer is "uhhh.... I don't know" well then I don't blame him!)

Also, have a look at Wikipedia's guideline of assuming good faith. Don't take that out of context. It says to assume good faith, except when it's absolutely clear that a certain user is not acting in good faith (i.e. continually vandalizing pages despite warnings to stop). Another wikipedia rule is "be bold," but to avoid taking that rule out of context, I should mention that being bold is not the same as being reckless.

Finally, not only should I, as a regular editor, show respect to the creators and admins of OSM, the creators and admins should show respect to people like me.

Hide this comment

Leave a comment

Parsed with Markdown

  • Headings

    # Heading
    ## Subheading

  • Unordered list

    * First item
    * Second item

  • Ordered list

    1. First item
    2. Second item

  • Link

    [Text](URL)
  • Image

    ![Alt text](URL)

Login to leave a comment