OpenStreetMap logo OpenStreetMap

cbeddow's Diary

Recent diary entries

With the new release of more than 59 million points of interest (POIs) from Overture, consisting of Microsoft and Meta POI datasets combined, the natural question arises: how can this be useful for OpenStreetMap?

Challenges to consider

The most important challenge in getting this data into OSM is making sure the place labels in Overture have an equivalent in OSM. This is mostly doable with automation, but many cases require context.

Validation of these is a forthcoming challenge: street-level imagery from Mapillary will be especially helpful, but being there in person to validate is also a big advantage. That aside, even if the data can be added to OSM one-by-one (not imported) with validation, the tags need to have a proper format.

Loading up the data to analyze

I got started by referencing Feye Andal’s great and succinct guide on viewing the data in AWS Athena. I found a slight lack of clarity in the instructions: you need to make sure your Athena instance, and your S3 bucket where queries are saved, are on us-west-2 region, same as the Overture dataset, unless you copy the dataset first to a bucket in your other region. So make sure the regions are the same, and the instructions should work flawlessly!

Analyzing the data

Exploring the dataset, there are 1037 unique place labels in it. 86,000+ are structure_and_geography which can refer to a wide range of natural geography or built structures in OSM, difficult to match with any specific tag without context. Others translate directly, such as a laundromat.

Some example tags include: "forest", "stadium_arena", "farm", "professional_services", "baptist_church", "park", "print_media", "spas", "passport_and_visa_services", "restaurant", "dentist"

To get most of the tags matched, I used Python to import the OpenAI module, and connect to my OpenAI account, which charges a few fractions of a penny per request.

I set a system message, which defines the role the AI should play or assume. My message was:

system_msg = 'You are a helpful assistant who understands data structures, place and map data labeling ontology, and OpenStreetMap tagging. I will give you single labels of a POI category, and you will give me back the single OSM equivalent tag that most makes sense in the format of list with a single string like ["key=value"] unless it has multiple tags such as a mexican restaurant, then give the list of multiple like ["amenity=restaurant","cuisine=mexican"] or if there is no good match you will write back in all caps, ["UNKNOWN"]. Only include a list of tags or the list with unknown value, do not include any dialogue.'

I made an empty dictionary:

overture_osm_dict = { }

Then I made a list of all the unique tags, and looped through it. My code looks like:

for tag in overture_tags:
if  tag not in overture_osm_dict:
    user_msg = tag
    response = openai.ChatCompletion.create(model="gpt-3.5-turbo",
                                        messages=[{"role": "system", "content": system_msg},
                                         {"role": "user", "content": user_msg}])
    osm_tag = response["choices"][0]["message"]["content"]
    overture_osm_dict[tag] = osm_tag

It is recommended to add some sleep timer, or handler for a timeout response, as parsing 1037 items came with probably 10 timeouts.

In the end I had a few tags that were unknown, and I made manual fixes as needed. Running the loop multiple times yielded different results, so it is good to be aware that the AI is not consistent.

I made various fixes to the JSON structure, including stray line breaks, quotations in the wrong place, and bad tag formats. Some tags were also simply invented, it seemed, such as amenity=water_supplier for Overture’s water_supplier, which I changed to office=water_utility though that could be quite wrong, depending on the POI.

There are other debatable tags that came out as unknown, so I added in tags:

  1. “personal_assistant”: [“office=administrative”]
  2. “kids_recreation_and_party”: [“shop=party”]
  3. “sewing_and_alterations”: [“shop=tailor “] instead of equal to “craft=sewing”
  4. “sports_bar”: [“amenity=bar”, “sport=”] but dropping “sport=” to just be a bar

There are many more up for review.

In my final version, the dictionary is something like:

{   "forest": [
"landuse=forest"   ],   "stadium_arena": [
"leisure=stadium"   ],   "farm": [
"landuse=farm"   ],   "professional_services": [
"office"   ],   "baptist_church": [
"amenity=place_of_worship",
"religion=baptist"   ],   "park": [
"leisure=park"   ],   "print_media": [
"amenity=newspaper"   ],   "spas": [
"amenity=spa"   ],   "passport_and_visa_services": [
"office=government",
"office=visa",
"office=passport"   ],   "restaurant": [
"amenity=restaurant"   ],   "dentist": [
"amenity=dentist"   ],   "sports_club_and_league": [
"sport=club"   ],   "thai_restaurant": [
"amenity=restaurant",
"cuisine=thai"   ],   "clothing_store": [
"shop=clothes"   ],   "insurance_agency": [
"office=insurance"   ],   "barber": [
"shop=hairdresser"   ],   "bar": [
"amenity=bar"   ],   "agriculture": [
"landuse=farmland"   ],   "accommodation": [
"amenity=hotel"   ],   "event_planning": [
"amenity=event_planning"   ],   "non_governmental_association": [
"amenity=community_centre"   ],   "elementary_school": [
"amenity=school",
"education=primary"   ],   "landmark_and_historical_building": [
"historic=yes"   ],   "gym": [
"leisure=sports_centre"   ],   "pilates_studio": [
"amenity=gym",
"sport=pilates"   ],   "hotel": [
"tourism=hotel"   ],   "advertising_agency": [
"office=advertising_agency"   ],   "educational_research_institute": [
"amenity=school",
"research_institute=yes"   ],   "furniture_store": [
"shop=furniture"   ], ....

The full gist is available to download as a Githib gist and I hope to get feedback on it, so we may arrive at a more officially agreed upon translation of the tags.

Conclusion

These POIs offer a lot of opportunity to improve one of the categories that is often cited as lacking in OSM. The quality is not perfect, whether in location accuracy, proper tagging, etc, but it is at least professionally curated. Nothing is better than crowdsourcing–which is how many POIs sourced from Facebook business pages or Foursquare check-ins are generated–and OSM is the best spatial crowdsourcing platform in the world.

Some data needs special analysis. For example, I asked the AI to help me with a case I could not verify without context, for example a structure_and_geography category, where the AI noticed the Turkish name for it has the Turkish word for “harbor” and recommended the tag is “natural=harbor”.

Before we can start finding ways to validate the data and ingest it into the map on a case by case basis, we need to have a good basis for the tagging. The user can always modify this to be more appropriate before confirming and sending an OSM changeset, but getting a good first guess to present to users helps reduce the friction and increase the success rate.

Location: Schönegg, Oberarth, Goldau, Arth, Schwyz, 6410, Switzerland

Hospitals in Iran - Reflections on Updating the Map for COVID-19

Posted by cbeddow on 27 March 2020 in English. Last updated on 28 March 2020.

Talking with the Community

I recently reached out to the OSM Iran community on Telegram, who are normally writing in Farsi and have made some incredible map improvements lately. I first came across their work in 2019 when I noticed the community on Twitter showcasing incredible detail on pedestrian paths and gardens in Iran. I’ve always wanted to visit Iran, but found myself instead getting second hand stories about the landscape and culture, and frequenting Persian restaurants when living in places like California.

In the recent weeks, I began reading about the extreme strain on the medical system in Iran in reaction to COVID-19, and I asked a fellow user who was relaying information from his family and friends there if he could tell me what areas appeared most affected. His tweet response set me off on a mission.

irantweet

I immediately set out to find a way to help the Iranian OSM community improve the map in the top five most affected areas. I studied Farsi some years ago, but couldn’t communicate a detailed question like this–so I tried a basic greeting then later pasted a snippet of Farsi from Google translate. The community quickly turned very welcoming and encouraged me to write in English. Claudius Henrichs also noticed my question and offered to collaborate. The community said that mapping more detail around known hospitals would be useful:

osmirantelegram

Prepping the Data

I found open data on hospitals in Iran, which was already on OpenStreetMap but prepackaged by HOT. I decided it would be best to look at the areas only mapped as points, so that we could check if building polygons existed. Claudius also suggested that we find out what parts of Iran are most undermapped, by getting all changesets from the past 6 months and seeing where they do and do not cover.

I pulled the changeset data from OSMCha.org, and shared an initial map on Twitter. Soon Mykola Kozyr caught on and took the data visualization to a new level. Mykola made a complete visualization but also shared a quick preview screenshot:

mykolakozyriran

This visual should be helpful for mapping almost anything in Iran, not only hospitals–hopefully this can encourage more mapping in new places. For now though, I went forward with the hospital data. Using MapRoulette, I created a series of challenges to check each hospital POI and add detail. The dashboards and different options on MapRoulette are incredibly powerful, and make organizing such a task very easy. The Tehran challenge was the largest, with about 70 hospitals to check.

tehranchallenge

Results and Reflections

Within a few days, Claudius, additional community members, and I had completed the checks for Mazandaran, Gilan, Qom, and Alborz, and there are just over a dozen remaining in Tehran province now. I would like to review some of the things I noticed when editing the map in this effort, and I hope it helps others look at similar tasks.

Buildings, parking, and other polygons

Many of the hospitals were indeed just points on the map, with only roads nearby. I ended up tracing many building polygons, including the building that the POI appeared to be placed on, and neighboring buildings for context. In some areas, I noticed that there were parking lots, so I mapped these as well–I assume it can be useful for people who need to know if they can park when arriving. Some hospitals were in small rural buildings, some in tall urban buildings, others on sprawling medical campuses. It was interesting to note that any full-size hospital always seemed to have a mosque nearby. The largest hospitals often also had large gardens, and some had helipads that needed to be mapped as well, easily visible from satellite.

Lines and ways

When adding parking lots, I also tried to trace the parking aisles, any service roads leading to the rear of the building, and if I could see it, any kind of gate. I checked to see if Mapillary imagery was available, but it was only there maybe ten percent of the time. This would have been helpful for seeing more context and detail. I also looked for walls around the facilities, which sometimes was extremely clear. It seems to me that most full-size hospitals in Iran have some kind of wall around the complete facility. I tried to map sidewalks nearby some hospitals, and mark crosswalks on the road–sometimes also what appeared to be pedestrian bridges over the road. It cannot be assumed that everyone arrives by car.

Amenity type and details

Many POIs marked as hospital were not really what we might think of as a hospital. Some appeared to be a doctor’s office, with the name indicating the name of the doctor and nothing else. Another I found was a Red Crescent “road base” in Farsi, which I translated as “Red Crescent Station”. Some had a specialty marked, included ophthalmology (eyes), gynecology, or even dentistry. In many of these cases I adjusted the tag from hospital to be a clinic, a dentist, or even a doctor’s office. The goal is to make sure people know where to get urgent care and maybe not be directed to a small doctor’s office, while also making sure it is as clear as possible what is actually there.

Language

Almost everything was written in Farsi including the alphabet, with some having an English transliteration and a few translations. Some had English names but with poor grammar or just was not capitalized, so I cleaned it up. I tried to translate what I could, including just writing the doctor’s name in English, writing the official name in English including the medical specialty.

Unknowns and local knowledge

A lot of detail was just not possible to add without local knowledge. The language issues and birds-eye view is already sometimes difficult, but possible to fix. Some of the most important details need local knowledge that can really take the map quality to a new level. It was unknown on most hospitals, for example, if they had emergency rooms, while others were vaguely name, had no contact information, and maybe were not even placed on the correct building, or the wrong side of the street. Much new data is needed on these, and this is the case in every country in the world, for all hospitals and medical facilities.

This includes: * Street Address - name, number, post code * Phone, website, and email * Clinic specialties - heart, skin, feet, etc. * Emergency=yes/no * Official names of the hospital (not just “clinic” with no identifier) * Location of emergency room entrances * Number of beds and capacity * More routing information like sidewalks, parking, and gates * Nearby facilities for families with loved ones in the hospital: banks, hotels, grocers * Specialty types like a focus on women or children

Mapping in more places

There is a wealth of information we can add to all of these. All types of data sources are useful: satellite imagery, government open data, street-level imagery, local surveys and knowledge, rideshare data. If you have ideas on how to improve medical facility data in your local area, I encourage you to take my own reflections and turn them into local tasks and challenges, as well as add new perspectives that I may have missed.

Thank you to everyone who helped map these tasks, to the OSM Iran community for being encouraging and communicative about what an outsider can do to help, and to Claudius and Mykola for helping make the ideas become something useful.

Location: Ghalamestan - Baradaran Javadian, District 11, Tehran, بخش مرکزی شهرستان تهران, Tehran County, Tehran Province, 13356-63393, Iran