xybot - just stop it

Posted by Richard on 7 May 2009 in English (English)

Unfortunately xybot isn't sentient, because otherwise, in the words of General Dreedle, we could take him out and shoot him.

xybot has not announced his presence on talk-gb, yet 'he' has taken it on himself, without consultation, to change "denomination=Church of England" to "denomination=church_of_england" here (and doubtless elsewhere). This similarly. Nor has he announced himself on talk-ie, yet this.

And what, pray, is this all about? User xybot, but created_by=Potlatch 0.11b? The bot knows how to work Potlatch? Blimey.

It is so, so tempting to write a xyrevertbot. If you see an automated user called General Dreedle you'll know I've succumbed.

Comment from Circeus on 7 May 2009 at 21:16

I thought it the CoE was denomination=anglican ?

Comment from Richard on 7 May 2009 at 21:30

Well, exactly. It might be. There are lots of subtleties like that - after all, it's worth recording that it's Anglican, yes; it's also worth recording that it's CoE rather than (say) Church in Wales, or the Episcopalian church. I could happily sit here and discuss these with you, or someone, all day.

Unfortunately xybot has decided that it knows best and is going to stamp over others' tags entirely without any of this discussion. Completely unacceptable.

Comment from SK53 on 7 May 2009 at 21:55

It also adds tags of religion=christian if in xybot's opinion a given denomination is christian. Seems a rather dangerous activity for some areas of the world where the only valid tag for any group than one's own is religion=heretic.

A more productive use of programmer time would be write something which could cope with capital letters and spaces in the render chain!

Comment from 42429 on 7 May 2009 at 22:01

At least, xybot makes a very good job correcting German spelling mistakes.

Furthermore, it standardizes religious denominations, e.g. catolic > catholic. Of course it cannot handle tags where mappers don't know the difference between baptist and lutheran (or the difference between protestant and prostitute).

Concerning organisational affiliation, I would suggest to add one or two additional tags (e.g. church membership, diocese, language or liturgy) instead of reverting all xybot spelling corrections.


Comment from randomjunk on 7 May 2009 at 22:07

I'd suggest just reverting the buggery out of xybot and banning it's controller from OSM forever more. I don't mind what it's doing so much as the way it is going about it (although some of the translations are very dodgy). It's completely unacceptable to be making sweeping changes like this without carefully discussing them and getting the community on-side first.

Comment from randomjunk on 7 May 2009 at 22:10

I mean.. what the hell is so wrong with the key "notes" that it needs "correcting" to "note"?!

Comment from 42429 on 7 May 2009 at 22:23

At least xybot exists since October 2008:

Comment from xybot on 8 May 2009 at 07:22

i'm sorry richard, that you are annoyed by the attempt of xybot in trying to correct obvious spelling errors in keys and standardising some values according to the rules given by the wiki (e.g. what was your stumbling block:
I'm very well aware of collateral damage that could be done by xybot and i try to optimise it and repair its damage every time someone tells me that it has done something wrong.
regarding your point of the "created_by" tag... i will delete this tags in future.

but just to play this ball back: i think with potlatch you created a wunderful software which lowers the barrier to edit osm data for a lot of users. but i don't even want to think about how many weeks of work my alter ego xylome spent in correcting errors by hand created by your software (e.g. when a long way was splitted, both the two new ways were created but the old way was not deleted. or users being not aware and unable to revert their accidential displacement of a single node or a forest several hundered square kilometres big.)

i know that xybot is constantly in the line of fire regarding its changes (so is potlatch and i've not seen you stopping (and i wouldn't want you to) despite the calls to "disable potlatch forever" in all the mailing lists), but please take into account that xybot has done almost 1.5million edits so far and i hope the vast majority of them are benign alterations.

but there are also a few malign alterations and i'd like to invite each of you to give me hints when xybot did something wrong or how it could be improved instead of starting a flame war.

xybot aka xylome

Comment from daveemtb on 8 May 2009 at 08:18

I think the point was that it would be good to discuss automated edits *before* not *after* doing them. It seems a fair argument to me.

Comment from Tom Chance on 8 May 2009 at 08:40

I have to say that in principle I 100% support the idea behind xybot -- it's ridiculous to prefer inaccurate tagging because of some individualistic desire to have "your" data stay intact in a community database -- but I do think it would be good to find a better way to make the wider community feel included in its operation.

Here's my suggested process:

Once a month, regular as clockwork, add a new tranch of suggested corrections to a wiki page, and allow a month's discussion. If there is more than a whiff of serious complaint, exclude it and move the suggested correction to a wiki page where people can see disputed corrections. If people seem generally happy with the correction, add it to the documented list of accepted corrections. Also allow

Leave xybot to run with the accepted corrections.

Comment from Richard on 8 May 2009 at 10:42

xylome/xybot - well, touch�é!

But the point is that discussions about Potlatch are carried out in public, on the mailing lists. The code is open source and everyone can see what it does. There is a trac component for it and if you don't like something, you are invited to make a suggestion or submit a patch. And as the summary Potlatch changelog shows, the code is regularly revised to take account of this.

None of this appears to apply to xybot. It turns up, makes some changes, disappears again. There is some information on the wiki, but it is clearly inaccurate - e.g. "was used once (2008-10-10) on the european data" when it is seemingly being applied to European data with great frequency now. I genuinely don't know whether the tag lists on the wiki, which also haven't been updated since October 2008, are accurate or not.

Now it appears (not least from FK270673's comments) that you have the buy-in of the German community, and that's really good; that gives you the right to work on the data in Germany.

But you do not have the buy-in of the UK community, at least not yet, yet you are extensively changing UK data - including even some UK-specific tags (particularly Church of England, Church of Scotland etc.). That is unacceptable - particularly as you don't necessarily seem to appreciate the context of the tags, changing, for example, "High Church of Scotland" to "Church of Scotland". You would be drummed out of town in the Hebrides for doing that! And there are several more.

As several people have said, you need to ask before making changes, not after.

Bear in mind too that the wiki does not give rules. It gives guidelines. Nothing more. Just because it isn't documented on the wiki, doesn't make it wrong (and conversely, just because it's documented, doesn't make it any good - smoothness=very_horrible anyone?).

On a separate issue: duplicate ways are not solely the "fault" of Potlatch, in so far as one can attribute anything that easily: they're usually a direct result of the server running slow. Now this is clearly an issue so API 0.6 introduced version numbering to cope with instances like that and, as best as I can tell, problems have been significantly reduced. There are still some issues but it's getting a lot better.

Comment from SK53 on 8 May 2009 at 10:43

Whereas robots can perform very useful functions, they, alongside large scale automated imports, can also seriously alienate contributors. Pretending that a robot which makes alterations without consultation is somehow equivalent to errors made by contributors whilst editing data with an editor of their choice is casuistry. Nor, as daveemtb says, is retrospectively pointing to wiki pages an acceptable method of announcing one presence.

Given that xybot has done 1.5 million edits I presume most of these are harmless, and some even beneficial, and fortunately few harmful (the following is seriously wrong and I'm surprised it has not caused problems in german-speaking countries: "denomination|christ-catholic" => "denomination|catholic#religion|christian", see ). I do think it is up to the community, however, to decide what, when and how such robots run. Surely a robot can run a pre-pass and generate lists of proposed edits, and even mail them to those who last edited the object. I don't really like the idea of having to consult a wiki page every month as suggested by Tom on the off chance that some global edit will change some tags which are of importance to me. Tag spelling corrections are better done by adding the correctly spelt tag (obviously does not apply in Richard's case).

In summary: always consult widely before making changes, let folk know before changes are made, make sure everything is fully documented in the Wiki in a way that most users can understand (for instance IS0 3166 country codes are documented with an e-mail discussion in German (de not DE), no mention of runs over European data other than Oct 2008 etc). The medicine might be good for us, but don't shove it down our throats.

PS. operator=Church of England;denomination=anglican, or are caps going to be suppressed everywhere except in addr:country!

Comment from SK53 on 8 May 2009 at 10:46

lost a link in the last message, hope this works Christkatholische Kirche.

Comment from Pieren on 8 May 2009 at 12:02

The problem again is not against a bot. The problem is this principle of making new change rules without any communication and waiting complains afterwards. So people discovering such changes by accident are surprised and suspicious about other changes that nobody might notice without checking carefully the logs.

Login to leave a comment