Recent diary entries
I have been working on some code to detect if a changeset is an import, SPAM, or if it has a tagging error.
Detecting SPAM and tagging errors is pretty straight forward. However, detecting imports is much more challenging. Before I started, I thought I knew what an import was. I was looking for large changesets, that only added 1 or two kinds of data. However, this criteria performs poorly in practice. In OSM many import changesets are not large, also it is not uncommon that the imported data has some hand editing mixed it.
My new definition of an import
An import is any addition to OSM that directly derives from other digital map sources.
This was posted this week.
I think it is fair to say that it has upset people, myself included. OSM seems like it is doing fine. contributers are up, the servers are running well, ID is amazing, the state of the maps are cranking along, OSM is widely used, perhaps even essential, everything is all roses, full steam ahead....
When the World Needs a Map, Give them a Database
Ouch, that one hurts.
Then at the bottom ...
The OpenStreetMap Foundation Culture
We need to think about our relationship with our commercial/non profit partner organizations. At a high level, we need each other and strong relationships should be a priority. However, inevitably there are situations where they are in conflict. For example, OSM's refusal to offer paid services to support its considerable infrastructure needs seems like a clearcut example of our partner organizations wanting to OSM to keep off of their turf, weak, and dependent on them and their donations. Please, don't blame Mapbox, this isn't Mapbox's fault, they are doing a great job, and I am delighted that they are so successful. But, it is hard for me to understand how MapBox can raise 164 million USD last year and the OSMF doesn't have enough money for any full time system administrators or software engineers. The US OSM chapter, is in the same boat. They are going to be hiring a director next year (yes!!), but the plan is to have the role focused on SOTM, outreach, mapping parties, blogging, twitter, and other safe/non-core activities (bad!!).
If you care about this, please join the OSMF and vote. Also, the US OSM chapter is voting now.
The AI/neural network/deep learning/machine learning wave is ready to touch OpenStreetMap. Modern machine learning algorithms require a lot of data, and we have lots of data. OSM is going to be a natural place that open source, machine learning algorithms development happens.
Exciting and useful machine learning projects are possible today, using only the OSM database. For example, statistical based editor presents, changeset anomaly detection, import changeset detection, and smart auto tag/value suggestions are possible with just the OSM database.
To open things up, we need additional data sources like overhead satellite images. However, satellites and airplanes are quite expensive. Microsoft/Bing and Mapbox purchase a license to use other companies images, such as DigitalGlobe. They are restricted by the upstream license on what they can offer to the OSM community. For example, when Facebook wanted to use machine learning to map all the roads in Thailand they had to purchase a license for the images. The license did allow them to share the images with anybody. The DeepOSM project handled the image issue by using public NAIP images. However, the NAIP images are not ideal for OSM, the resolution is only 1 meter, and coverage is just the US.
Over the past two months, both Microsoft/Bing and Mapbox have completed reviews of their satellite image licensing terms and determined that they are capable of offering their image layers for nonprofit machine learning projects whose goal is to improve OSM. The great news is that if they want to support Machine learning for OSM they can.
Microsoft/Bing has gone ahead and made it official with an email to the talk-us list.
Through one on one communications with Mapbox, they have asserted that their standard terms of service allow this use case as well. However, they are worried about the load on the servers, so for now, they would like to grant permission on a case by case basis. This is reasonable request given how data-intensive the algorithms are.
Today, there is no technical reason preventing the volunteer OSM community from utilizing machine learning to accelerate the project. Basically, anything that is visible in a satellite image is now going to be able to be identifiable via software at the same level of accuracy as an “armchair” mapper: baseball fields, tennis fields, basketball courts, soccer fields, football fields, bridges, solar panel farms, roads, driveways, parking lots, buildings, lakes, rivers, wetlands, rail roads, water tanks, gas stations, running tracks, vineyards, fields, forests, sand, jetties, lighthouses, airports, playgrounds, fences, wind turbines, pools, ski lift, road lanes, traffic lights, graveyard, power lines, etc.
Machine learning algorithms will obviously be used more often in future imports and automated edits. However, there are other high-value places that machine learning algorithms could be utilized by the project.
- Maproulette tasks could be generated that highlight where older OSM data doesn’t match newer satellite images.
- A changeset monitor could be written that compares real-time edits to satellite images and adds changeset comments for edits that look unusual.
- OSM editors could suggest tags based satellite images.
- OSM editors could suggest/snap geometries based on the satellite images.
- OSM editors QA tools could integrate satellite images into the validation checks.
- Satellite offsets could be determined automatically by using GPS traces.
- Using previous DWG and community reverts and redactions, wildly bad changesets could be quickly noticed and reviewed by the community.
- Overpass queries could include features extracted from satellite images.
It will take some time for all of this to get implemented, but I am sure that it will happen eventually. If you are a developer and this kind of thing interests you, the field is wide open!
Over the past week there has been a very long and detailed conservation about tagging parks and conservation land on the talk-us email list. This is a topic of great interest to me, but, the emails have reminded me of the fact that attempting to model everything on earth as an XML file is ultimately futile. Happily, this contradiction built into the OSM seems to insure that the project stays alive and vital over the long term. There is always room for improvement, the map can't ever be completed. The XML file is never going to be 100% right.
I have been working with baseball fields in OSM for my deep learning/OSM project (https://github.com/jremillard/images-to-osm). My partial OSM dataset has around 13,000 baseball fields. I have discovered that around 7% of them are traced just around the infield!
If you map baseball fields, please map the entire field not just the infield.
All of OSM has over 100,000 baseball fields mapped, thats around 7,000 fields are too small. I updated the wiki for sport=baseball to recommend that the entire field be mapped rather than just the infield. In the future, I might make a MapRoulette for this. All of the baseball fields that have an area less than 100x100 ^feet are suspect.
The Massachusetts Schools MapRoulette challenge hit a milestone today. The number of open tasks fell below 900, over 600 schools cleaned up so far.
Two weeks ago the rendering of landuse=conservation was dropped from the default rendering on openstreetmap.org. While the landuse=conservation has problems, it is used quite a lot in near my home. I was originally worried about what it would mean to how the map would look with this tag missing. However, when the landuse=conservation was dropped, other tags were used such as leisure=nature_reserve, and landuse=forest for rendering. To my surprise, the new rendering exposed tagging problems that were not apparent with the older rendering style. I have been fixing them over the past week.
Everybody knows that the default map is a force on how the data looks, and in theory designed for mappers, perhaps we should mix up the rendering style every once in a while to emphasize different kinds of data. For example, November/December can be addresses, January outdoor sports, February public transportation, March railroads, April hiking, May political boundaries, June land use, July coastline, August rivers and lakes, September highways classifications and routes, etc.
There is a proposal to change the rendering of footway and path so that they basically look the same in the default OSM rendering.
I fully support this.
Perhaps, this is the best way that widely used, but confusing and redundant tags such path/footway are rationalized. Basically, give up, declare them synonyms, and render based on dependent/sub tags. Our limited energy can be used on better tags such as surface, designated, access, foot, pet, etc to describe this thing you can walk on in more detail.
This winter I have been doing some mapping of ski areas in Massachusetts and New Hampshire.
For many situations landuse=winter_sports works when mapping the larger ski area. However, some ski areas are run by third party organizations that are not the land owners, they basically just have permission to maintain the trails during the winter. Also, the land use during the summer/fall/spring is often better tagged as something other than winter_sports. Lastly, sometimes they are spread out.
In these situations, the site relation is a better fit. The parking lots, ticket counters, etc can be included. So far so good, there is even a site=piste tag/value, but it was voted on and rejected last year.
I am using it anyway. Hopefully, the site=piste will be used enough that a re-vote can be happen again in the future.
This week, I created a new MapRoulette challenge, called "Massachusetts Schools".
Thank you Martijn van Exel, and Serge Wroclawski for making it possible (even easy) to add in your own challenges to MapRoulette. I used the loader python script, which requires a postgis database. It was just a couple of hours of fiddling with the SQL statement to make it work. I still hope to make small adjustments to the help and instructions text.
In Massachusetts, most of the schools were imported twice. Once from the national GNIS database import, and a second time from a statewide MassGIS data set. Both imported data sets are old and are getting stale. A surprising number of schools have moved, closed, or changed names since the data was imported. The schools need some attention.
Hopefully over the next 2 years, we will get them cleaned up.
We also plan on using this challenge in our local OSM OpenStreetMap-Boston meetup
What is OSM?
A crowd sourced, free map of the wold, containing information that can be verifiable by ground surveying. What about administrative borders? Probably don't want them removed ....
A crowd sourced, free map of the world. What about imports? We are up to our ears in imported data. That ship has sailed.
Lets not make things more complicated than they need to be.
The best free map of the world.
This is a bridleway (horse race track) that I traced out last year. The bing map is the stage/MassGIS data. I remember the same road line in OSM.
You can see that Google has traced out the same path I used through the parking lot, and incorrectly traced out the bridleway as a two lane road. Google map maker says it was made by "Google Automated Data Quality Improver 1" on July 24th. The nodes are not in the exact same places, I bet they are monitoring OSM, and setting up tasks for the 7,000 strong mapping army they employ to trace new segments by hand when something new pops into OSM.
Hey google, you should probably not copy highway=bridleway from OSM as roads.
Some software to pick out road signs from video.
The company is out of Russia. Any Russia OSM'ers have a contact with itseez? It would be cool to use this software for OSM.
There has been a lively thread on the talk mailing list about adding "consumer" features to the main openstreetmap.org page. This is really about adding routing, clickable points of interest, and geolocating support to the top level home page.
Some people, including our Chairman, Simon Poole, worry about several issues
- Diluting our energy on a non-core activity
- Diluting our web design, making the main page less functional for core mapping activities.
- Competing with our partners and downstream data users, making them less likely to work with us.
- Trying and failing, hurting our brand. It is a hard task.
- Trying and succeeding, driving zillions of people to our front page. The increase in scope will require a professional organization to manage. This is also known, as let’s not look like Wikimedia Foundation, with 176 employees.
Start with first principles, our mission statement: provide free geographic data, such as street maps, to anyone, for the entire world. This is a big goal, just a tiny fraction of humanity lives in places with a good OSM map. We need more people, a lot more people. If we want the entire map to looks like our gold standard, Germany or London, we need around 35 million mappers. By that metric, we are only about 2.5% complete. Another metric is Wikimedia, which currently has 20 million named user accounts. The mission statement is not going to be fulfilled, if we don’t have enough people editing the map, period. The best way of getting those mappers, registered, and mapping is for us to provide a service to allow people to use our data for their day to day needs on our site, with the big fat “edit” button on top. Our third party data consumers will not put that “edit” button on their interfaces. There is no benefit to them to dilute their interface to help us. Don’t get me wrong, I am very happy that people are using our data on other services such as apple, motion gps, craigslist, 4 square, etc, however, they are not going to carry our water for us and hand over our next 34 million mapper. We have to go get them ourselves. I don’t see any way of getting 34 more million registered mappers unless our site is can be used by everybody for normal mapping activities. So sorry, the current mappers are just the first 2.5%, getting the next 97.5% onboard is more important than serving the first 2.5%. To say otherwise, is giving up on our mission statement.
Our second principle is that we are a “do-ocracy”. When somebody shows up on the lists asking for routing, clickable POI, location service, mobile support, etc the only acceptable answer should be to ask them to help do it!
- If they can write code, do that,
- If they are system administrators help with that.
- Volunteer to test out new code.
- If they are good at raising money, get some money.
- If they can get some big servers donated, do that, hosting, donated bandwidth, do it, do it, do it!
- or the simplest way to help is to go work on the map and be as supportable as possible of the people that are working on these features.
The reality is that getting routing, or good POI support, mobile, etc are all very tough jobs. When they launch, we should expect them to not be competitive with the commercial providers. To be blunt, we are going to suck at it for years after we launch. The probability of everybody crushing our website the day, week, or the year after we turn on routing is exactly zero. However, this is how software is developed; without shipping something crude first, it is impossible to make something great later.
Please no more “no’s” on this topic.
There was an import of lakes/ponds over half of Massachusetts in 2010 that had no conflation logic. It ignored the lakes/ponds that were already mapped. Since then, OSM has had several hundred overlapping bodies of water all over Massachusetts. Last night, the last one was cleaned out.
I am very happy that the job is completed now!
I have been playing with osmosis/postGIS this week. I was finally was able to import the MA OSM extract today using osmosis.
It took a lot longer than using osm2pgsql and I was very surprised at how large it got when it was imported.
massachusetts-latest.osm.pbf - 205 Megabytes massachusetts-latest.osm - 5 Gigabytes - 25 time (it is an XML file) PostGIS snapshot db - 10 Gigabytes - ???
It looks like PostGIS/osmosis is not noticing that we repeat the same tags, over and over, and over again in the data.
The Massachusetts building import was completed this week.
Massachusetts has virtually all of its structures mapped in OSM now. 2.1 million structures were imported.
There is an address data layer being released this fall from MassGIS.
Hopefully we can get addresses on all of those building this year.
When you log onto the OSM web site, you get this really nice map of all of the users hometowns, click on the bubble, you get the user name. This is a great! Disappointingly, I have discovered at least 80% of the "users" nearby are in fact people that made an account but never managed to make a single change on the map. Also, many of the accounts are several years old.
I propose that OSM should delete any account that is over a year old and does not have any actual map edits. The account is dead.
This will make the "find nearby users" feature signal to noise go way up and will keep everybody honest about the actual size of the OSM community, which is probably 20% of the registered user count.
Last week the Massachusetts mapping department (MassGIS) released data for all of the buildings in the state.
Since it is likely that next year we will be importing this data into OSM to support addresses, I figured it would be useful to convert the files to OSM format and share them.
The data is here.
Each town in MA has its own ZIP file. In each zip file is two OSM files. One OSM file has all of the buildings in the MassGIS data set. The second file has just the buildings that are missing from OSM.
If you are mapping in MA, please take a look.