OpenStreetMap

Preparing accurate history and caching changesets

Posted by geohacker on 12 April 2017 in English (English)

It's important to see what exactly happened to features in a changeset. This means identifying the state of each feature, the history, including geometry and tags that changed. The OSM changeset page doesn't give you a clear idea of what happened in a changeset - you see a list of features that changed, and the bounding box of the changeset.

image

The changeset XML from OpenStreetMap only has current version of the features that changed in the changeset.

Overpass offers augmented diffs between two timestamps that contains current and previous versions of each feature that changed in that period. We put together an infrastructure that queries Overpass minutely, prepares changeset representation as a JSON, and stashes them on S3. The augmented diffs are also cached on S3. This means that the load to Overpass instance can reduce drastically while many of us are looking at the same changeset.

image

This is directly used in changeset-map - a utility to visualise OSM changesets.

JSON

The cached changeset JSONs are available here: https://s3.amazonaws.com/mapbox/real-changesets/production/changeset-id.json. The JSON looks like this for a changeset by user Rezhin Ali.

This is inspired by the work Development Seed did with Planet Stream. We use osm-adiff-parser to convert the augmented diff to changeset JSON.

// 20170411184718
// https://s3.amazonaws.com/mapbox/real-changesets/production/47656996.json

{
  "elements": [
    {
      "id": "4787752634",
      "lat": "36.1823442",
      "lon": "44.0158941",
      "version": "2",
      "timestamp": "2017-04-11T13:12:35Z",
      "changeset": "47656996",
      "uid": "5323129",
      "user": "Rezhin Ali",
      "old": {
        "id": "4787752634",
        "lat": "36.1823442",
        "lon": "44.0158941",
        "version": "1",
        "timestamp": "2017-04-11T08:02:21Z",
        "changeset": "47649032",
        "uid": "5323129",
        "user": "Rezhin Ali",
        "action": "modify",
        "type": "node",
        "tags": {
          "name": "ێەبد مەنان",
          "name:ar": "ێەبد مەنان",
          "shop": "car"
        }
      },
      "action": "modify",
      "type": "node",
      "tags": {
        "name": "Abd Manan",
        "name:ar": "Abd Manan",
        "shop": "car"
      }
    }
  ],
  "metadata": {
    "id": "47656996",
    "created_at": "2017-04-11T13:12:34Z",
    "open": "true",
    "user": "Rezhin Ali",
    "uid": "5323129",
    "min_lat": "36.1823442",
    "min_lon": "44.0158941",
    "max_lat": "36.1823442",
    "max_lon": "44.0158941",
    "comments_count": "0",
    "tag": [
      {
        "k": "created_by",
        "v": "MAPS.ME ios 7.2.3"
      },
      {
        "k": "comment",
        "v": "Updated a car shop"
      },
      {
        "k": "bundle_id",
        "v": "com.mapswithme.full"
      }
    ]
  }
}

Empty changesets

It's possible that certain changesets are empty. They could have been opened, but failed to upload changes due to unreliable network, and eventually gets closed in 60 minutes. Empty changesets are not cached.

Long changesets

Changesets can also remain open for a long time. For example this one from user Manuchehr was opened 36 mins. Experienced users like to survey outdoors, and upload data in bulk. Some editors also don't close changesets automatically. Idle changesets get closed eventually after 60 mins.

When features of changeset comes through in a later minutely diff, we update the cache on S3. This will ensure, changeset remain complete.

Database transactions and augmented diffs

A changeset being closed doesn't mean that all features that changed have been committed to the OSM database, and appear in the minutely diff right after. Some features may take longer to commit to the database, we handle these by updating the augmented diff from S3, and then recreating the changeset JSON. You can read more about this case here.

Missing changesets

Changesets that are after March 1, 2017 are cached. We are considering doing a slow backfill, but this is entirely dependent on Overpass. If you see something missing, or unclear, please open a ticket and let us know!

Location: Indiranagar 1st Stage, Indiranagar, Bengaluru, Bangalore Urban, Karnataka, 560001, India

Comment from tyr_asd on 12 April 2017 at 07:47

Hey. Great work and thanks for providing this as a service!

PS: are the cached (raw) augmented diffs also publicly available?

Hide this comment

Comment from mmd on 12 April 2017 at 08:14

Does this imply that an augmented diff may be updated after it has been published? Is there some mechanism to find out as a data consumer that there has been an update? I think for chsngeset analysis this is fine, I see some issues to keep a local db up to date with this approach. Would you agree to this?

Hide this comment

Comment from geohacker on 12 April 2017 at 08:27

mmd - Yes. An augmented diff may be updated after it has been published. There's currently no way for consumers to know when a file has been updated. We do this using S3 notifications through AWS SNS, but I'm not sure how best to expose this externally.

Hide this comment

Comment from Stereo on 12 April 2017 at 10:13

How cool. SNS can’t be exposed externally directly, but it can be used to trigger actions, e.g. push notifications, email, calls to another trigger script...

Hide this comment

Comment from umphrey1012 on 12 April 2017 at 14:58

Awesome geohacker. Just wanted to point out that SNS topics can be exposed externally. This is how we provide notifications for a number of the datasets on https://aws.amazon.com/earth/ (like Landsat 8, Sentinel-2, etc). Below is a sample topic policy that allows S3 to post an event and anyone to subscribe from SQS and Lambda services. Can be used as a base to open up more access.

{ "Version": "2008-10-17", "Id": "PublicSQSandLambdaSNS", "Statement": [ { "Sid": "AllowLandsatPDSPublication", "Effect": "Allow", "Principal": { "Service": "s3.amazonaws.com" }, "Action": "SNS:Publish", "Resource": "arn:aws:sns:us-west-2:xxxx:NewSceneHTML", "Condition": { "ArnLike": { "aws:SourceArn": "arn:aws:s3:::landsat-pds" } } }, { "Sid": "allowOnlySQSandLambdaSubscription", "Effect": "Allow", "Principal": { "AWS": "*" }, "Action": [ "SNS:Subscribe", "SNS:Receive" ], "Resource": "arn:aws:sns:us-west-2:xxxxxx:NewSceneHTML", "Condition": { "StringEquals": { "SNS:Protocol": [ "lambda", "sqs" ] } } } ] }

Hide this comment

Comment from PierZen on 21 April 2017 at 21:18

Great geohacker.

To monitor an area, we need to query for the changesets for a given BBOX and datetime period. We cannot rely on the OSM API to provide the list of changesetid since the Api service limits the no. of changesets to query. And Overpass do not provide this facility.

Do you plan to provide such service? This would let develop tools without first installing a server and loading all OSM data.

Hide this comment

Comment from geohacker on 25 April 2017 at 03:33

Hey PierZen - have you tried using https://osmcha.mapbox.com/ - OSMCha let's you query between time periods and filter by a bbox, and then visualise each changeset.

Hide this comment

Comment from PierZen on 25 April 2017 at 03:50

I talk about developping scripts to analyze the data. If we could obtain this data as geojson outputs, it would be great.

Hide this comment

Comment from Zverik on 27 April 2017 at 08:21

PierZen, you can use WhoDidIt for that. That won't be too precise (it stores tiles with 0.01 degree granularity), but it is quite fast and has data since 2012.

Hide this comment

Comment from gumtree australia on 7 August 2017 at 14:57

To monitor an area, we need to query for the changesets for a given BBOX and datetime period. We cannot rely on the OSM API to provide the list of changesetid since the Api service limits the no. of changesets to query. And Overpass do not provide this facility.

Do you plan to provide such service? This would let develop tools without first installing a server and loading all OSM data. http://www-gumtreeaustralia.com/

Hide this comment

Comment from mmd on 21 August 2017 at 16:31

A new Endless Achavi demo is up, see this post for details.

It's giving a different perspective to the question: is caching really worth it? Should we rather spend more time improving Overpass performance instead?

Hide this comment

Leave a comment

Parsed with Markdown

  • Headings

    # Heading
    ## Subheading

  • Unordered list

    * First item
    * Second item

  • Ordered list

    1. First item
    2. Second item

  • Link

    [Text](URL)
  • Image

    ![Alt text](URL)

Login to leave a comment