Recent diary entries
I've been having a great time recently using Osmium to write my own analysis code in C++ to look for anomalies in the PBF extracts. Today it found this very strange coastline in South Africa:
Perhaps, i thought, this is some rare geological formation, that makes an amazing wavy line? So let's look at the data over aerial:
Uh.... what? I've seen a lot of weird and bad map data, like the mechanical grit of PGS and all the horrors of TIGER, but this was new. It's as if some cartographer said... "yeah, it's a coastline! Uh, what kind? Uh.... a wavy coastline? Yeah, wavy! Lots of waves.... I LOOOVE to draw WAVES, wheeeee!"
I should mention that this appears to go on for hundreds of kilometers.
The importer of this way is an "Adrian Frith" but it's most certainly not his fault, the source tags says "Municipal Demarcation Board" so it was probably made by some government department, or maybe a contractor that was getting paid by the node?
I'm sorry to say I'll be quickly tidying up this coast, so perhaps by the time you read this, you won't be able to see the waves at, for example, here. On the other hand, coastline changes are special and take a while to process, so the blue ocean wobbles will probably stay for quite a while.
The Mexican statistical institute INEGI opened their data last November, terms are now compatible with OpenStreetMap. My colleague Rub21 has rendered out a first layer for mapping - take a look at our blog.
Mapear con los nuevos datos abiertos de México
En Noviembre el Instituto de Estadística y Geografía de México (INEGI) abrió sus datos. Los nuevos términos son compatibles con OpenStreetMap.
Red Nacional de Carreteras de INEGI
My first entry. Made road corrections to an area I know very well. Looking forward to participating in this effort.
Since the State of the Map in Buenos Aires, Ive been able To try out some possible indicators, I tried out a dataset for my home region Flanders. Here's some examples of things to measure.
The nodes table contains all POI's defined as nodes, but also all the nodes that make up the lines and closed lines (polygons) of Openstreetmap. We can reasonably assume that almost all untagged nodes will be part of lines or polygons. Some tagged nodes are also part of lines. For example, a miniroundabout, a ford, a barrier, etc, should always be part of a line.
The total number of nodes is made up almost completely made up of nodes that belong to something else. That's to be expected of course.
Over time the number of tagged nodes increases. But the number of tags on these nodes increases faster. In 2009, there were on avarage only 1,24 tags on the nodes, now it's over twice as many.
What gets tagged? Here's a quick breakdown in some very wide categories. Road info are all the kind of tagged nodes you'd expect on highways, the kind that adds to better routing and safer driving. POI's are things like banks, schools, fuel stations, etc. These two take top spots, but in 2014 there was a big jump in the first group.
Infrastructure nodes like those belonging to railways and high tension electricity lines are only recently being overtaken by address nodes. The release of open data about addresses in Flanders is probably the cause of the big jump. However, most addresses are tagged on buildings, so they do not show up here. For POI statistics, it would be best to just take the sum of nodes and points for the same tag combinations. Two problems arrise. One is practical: there seems to be something wrong with the way the history importer handles polygons. It might have to do with the lack of support for relations, but I don't know yet. One more thing for the to investigate list. The second problem is that sometimes the same POI has both a polygon and a node tagged with the same information. This is not good practice, but it happens. You could remove nodes that geographically fall within polygons if the tags are the same. But I wouldn't know how to do that in my setup. It zould take a lot of processing as well. And my available processing power at the moment is way too small as it is.
On to lines. In most cases, the thing to measure is the length of these. The absolute number of lines is mostly unimportant. A river is a river, wether it consist of 10 or a 100 bits and îeces. A nice example of how crowdsourcing works in practice is the evolution of the waterway network. First we see a quick growth of the river network (length in km). As the growth of the rivers winds down and stops, we see the streams taking off. So the crowd has finished mapping all the rivers, and only when that is finished, the smaller streams get more attention. Rivers are sometimes mapped as polygons too. Normally the lines are not deleted as this happens, so on network completion this has no impact. Of course the level of detail does increase. A way to measure the detailedness of the river network, could be to count the nodes of all lines and polygons making up this network.
A similar picture for roads. Main roads (tertiary to motorway) start of as the largest category. Minor roads (residential, unknown, unclassified) follow but overtake them quickly. Full network completion seems to be achieved by 2013-2014. Other roads (mostly service roads) grow slower, and steady. Just like "slow roads" (mostly footways etc) the steady growth seems to indicate that it is either more or lower priority work to complete this network. So these might keep growing for many years to come.
Network completion isn't everything of course. A lot of extra information is needed to have a good, rouatable map. This kind of infor is often mapped as tagged nodes on the map. The history importar does not load realtions unfortunately, so the number of turn restrictions can't be counted with my method. In the graph we compare the growth of road info nodes with the evolution of the road network. Again, first the basics get mapped, only as the first prioirty nears completion, real progress is made on the extra's.
So why do we need global statistics like this? To learn if these are general patterns. To see if imports disrupt these patters. Or if they only occur when population density and wealth is high enough. To see how complete maps are - just looking at the graphs, you can often see which features are mapped completely and which aspects of the map need more work. Based on the files generated in the process, it's not very hard to classify mappers: are they local, do they have local knowledge or are they probably remote mappers. The distribution of these is good to know, but more than that might give important insights. What happens when remote mappers reach road network completion? Does this increase the chance a good number of local mappers pick up the mapping that needs local knowledge? That might inform if and when remote mapping should be encouraged - or avoided. A lot of these issues give rise to heated arguments. Wouldn't it be nice to have some data to corroborate opinions?
As I said before, there is a lot left to be done. At State of the Map in Buenos Aires I got many tips on how to move ahead. And that has been quite helpful. I could for example never have imagined how incredibly simple it was to add length and area to lines and polygons. As old problems get solved, new ones show up. I just found out that the number of adresses in my polygon analysis is way smaller than other peoples results. SO there goes another day in finding out what goes wrong.
So even though my set-up is still not really finished for a more complete analysis, it would be nice to start some basic worldwide analysis (see the links at the start of my previous post on the subject) available soon. For those who don't know my little project, the idea is to provide these kind of statistics in an interactive platform, making them available for every region, every country, every continent and the whole world. There's also a video available (which I daren't watch yet) of me mumbling through the idea at State of the Map.
One little detail: my computer can't really handle the denser regions. Flanders was on the limit of what I can do. And there are much larger areas which are just as dense. So if you can spare a little server, I'd be happy to use it :)
Checking and editing of the roads we passed during our trip to South Africa has been completed.
We began today the serious mapping of Sukadana, KalBar, Indonesia by staff at ASRI. Thanks to jrpepper for the original digitization from MapBox. Now Agus has started driving all the roads with his GPS and is remapping the roads based on his traces, and naming them all. This work will help our in-house GIS system, and hopefully be useful for others.
I made a proposal/summary for mapping of campings at the OSM wiki page for campings.
Review of roads travelled in Oman during our trip from the Netherlands to South Africa has been completed. See De einder voorbij.
Made small changes to NIU's campus and improved the shoreline for Magician Lake.
I'm learning this along with the students in my lab. Hello students!
Very funny this conversation on twitter:
OrdnanceSurvey, like the IGN in France, despises OSM amateur cartographs but is calling for amateur (and free) photographers to illustrate their new printed maps ... "Do what we say, not what we do"
This is my first OSM Diary entry!
overpass turbo has been around for a little over two years now. In this time, it arguably changed how developers and mappers interact with OSM data. Let us take this opportunity to look back and take a glimpse at some statistics:
The user-base has more than quintupled from the initial group of early adopters as can be seen in the following Piwik graph:
Note that the actual absolute number of visitors is likely significantly higher than what is reported here, because surely many of you have the do-not-track flag activated or are using tracker blocking software in your browsers. Speaking of it – as of today you can opt-out from any tracking on overpass-turbo.eu also by simply switching it off in the settings dialog under the privacy tab.
Shortly after its release, overpass turbo got the ability to share queries in the form of short URLs (e.g. http://overpass-turbo.eu/s/4). Here is some insight into what queries people have been sharing since then:
This map shows the locations associated with each shared query:
Of course, central Europe is quite the center of activity, but in general the tool seems to be used all over the planet, which is nice.
The next thing we're looking at are the two query languages. In the beginning overpass turbo preferred the Overpass XML variant (in code examples and queries generated by the wizard). Later, this default was switched over to the QL query language. This can be seen in the following graph: red is XML, blue stands for QL, brighter colours stand for queries that are taken or derived from output produced by the query wizard. Each column represents a set of 512 consecutively shared queries. Note that this means that the x-axis isn't a linear function of time [timestamps are not stored in the short-url database as they aren't needed to provide the service].
One can immediately identify two main events: First, the introduction of the query wizard in Dezember 2013, and the above mentioned switch from XML to QL as the default query language in October 2014.
Another interesting fact: about 10% of all shared queries use some amount of MapCSS styling.
The question now is what will the next few years bring? Let's find out! ;)
In Finland they seem to think it's a good idea to also add the road signs themselves. I tend to agree with them, but trying to add them all, may not make sense, of course, as there are a gazillion of them.
There are no applications making use of this information, but for us it would enable double checking why some ways have certain tags.
In Finland they are also able to find where zones are 'leaking' and they report this back to the administrations, so it gets fixed.
So I'm not saying we should aim to map all of them, but I still want it to be possible and convenient to add those that have our interest.
So I've been working on the RoadSigns plugin to make sure it has data about the Belgian Road signs. The work is not done yet, but I think I was now able to add all the accompanying signs and all signs related to parking, of which there are surprisingly many! The way it works now, you'll have to remove the tags it adds, for those objects they don't apply to. I've made a few suggestions for improving the workflow, but it's unlikely those will be implemented anytime soon, except if I get my hands 'dirty' and do it myself...
So the effect of the sign remains on the ways, and the (Belgian) code for the sign itself remains on the new node you created before using the plugin.
If you don't check the tick box Traffic sign, that code won't be added and you don't have to remove any tags. The plugin then does what it was designed for, add the effect of the sign to the ways it applies to.
What I'm not sure of, since it was an enormous task that I gravely underestimated, is whether all the tags, that are applied as an effect are actually correct. So the plugin needs testing.
Or you can have a look at this wiki page, there may be obvious errors in it that jump out to you :-)
Using the plugin is a bit more convenient though, as you can actually see the signs, instead of those codes.
Oh, if you see that additional signs are missing from signs they can be next to, also report that please.
Last year the USGS started releasing 1 arc DEM files. I've been using their 3arc files for a couple of years, so I decided to take a look of their quality. I ended up discussing something else:
So the current JOSM release allows you to load and save *.osn files. There is a daily notes dump available at http://planet.osm.org/notes/
But you can't filter them to only display those that are open. I wanted to find a solution. First, I thought about writing Python or C# app. Turns out there is much simpler solution: XPath is a syntax for XML element selection. With xml_grep from xml-twig-tools package I can apply it to an XML file like this:
xml_grep --exclude /*/note[@closed_at] planet-notes-latest.osn > open_notes.osn
You can of course extend this expression to filter by geographic area or date.
I thought I would record what I do for traces for any future reference. This applies to walking or driving traces.
For hardware I use a Holux M-1000C Wireless GPS Logger. This is an external bluetooth GPS receiver I use with my phone. Prior to June 2013, the phone I used was a Samsung Galaxy S. After that date I have used a Samsung Galaxy S3.
The GPS on my original phone was particularly bad. I believe when the Galaxy S was originally released the GPS was basically broken. I was primarily using the GPS in my phone for geocaching so the poor accuracy of the GPS was frustrating. I also drained the battery quite quick between the GPS and downloading maps. Improving accuracy and battery life were the main reasons for getting a separate receiver.
For software I use Locus Map Pro. Locus Map has good geocaching capability as part of its significant mapping capability features. The reason I was looking at a mapping application for geocaching was that I was looking for good off-line map capability. Locus Map can store tiles for off-line use or use off-line .map files rendered by the Mapsforge library. I have a limited dataplan so all the maps being downloaded for geocaching were using up all my data.
I started mapping on OpenStreetMap to improve areas I wanted to do geocaching in; for example, parks. There are phone applications focused on collecting OSM data. However, the ones I have tried all use on-line maps so that is not desirable for me. So I continue to use Locus Map to record waypoints of amenities or features I wish to map. I create my own off-line maps using a Mapwriter plugin custom tagging file with Osmosis to create the file and a custom Mapsforge theme to render the map. That way I see everything I add on my phone.
I created my own recording profile in Locus Map to save a point every 1 second AND 1 meter. The default accuracy is unchanged at 100 meters, meaning it will exclude all points whose accuracy is worse than 100 meters. This could be lowered but I am not sure it matters. Bad data would tend to be obvious visually and I have not noticed a situation like this. The profile will record points while standing still. I leave it that way for trail mapping as I will come to deliberate halts to better draw the trail, particularly at trail intersections or sharp corners. I would not want to lose that visual information. It will not record Cell ID information which is an alternative to track recording if GPS is not available. I have not experienced that situation.
I do clean up some of the traces I upload. This is typically in the case where I did not shut off the recording soon enough and I went somewhere that might be misleading. There are also sometimes scribbles (I read that description somewhere) where I was going back and forth in apparently random directions that make for a mess in the track visually and created data of no real value.
There are other things or behaviors I do as part of tracing. I have excluded them from here as I was focused more on recording the technical details.
As there has been an Address import in the Brussels region, and the Flemish Agiv also opened a database with address positions (not good enough to import directly, so we agreed to draw the buildings from aerial imagery while adding addresses, that way we at least check consistency of addresses). In Flanders, it's a slow import that started around the end of 2014.
So I wondered what evolutions were visible lately. If it was possible to see where people were editing. Stuff like that.
First I investigated the total number of addresses per province. Brussels capital region doesn't have any provinces, nor does it belong to a province (although it's completely enclosed by one). But Brussels is included next to the provinces, just to cover Belgium completely.
Here you clearly see that Brussels has many addresses mapped. Most likely due to the import. But between other provinces, there are also major differences. Oost-Vlaanderen and Vlaams-Brabant are both part of Flanders, they have the same resources, but there's a major difference. The population also doesn't seem to matter, as Oost-Vlaanderen has almost 1.5 million inhabitants, and Vlaams-Brabant has just 1.1 million.
Then I wanted to compare addresses with their data types, to see if nobody just imported nodes from Agiv without drawing the buildings. Nodes as addresses are not forbidden (f.e. on an entrance), but the clear majority should be on buildings.
Ok, this looks good. Most addresses in Flanders are on buildings. Ratios seem rather constant. There are a few addresses on relations too (normally multipolygons), however, too few to be visible.
After that, the evolution of addresses through time is also interesting.
Here you clearly see the import of Brussels. Skyrocketing all provinces. But address mapping in some other provinces is also gaining traction lately, however small.
Overall, I'm very surprised by the differences between the provinces, even if you ignore Brussels. And I hope address mapping in other provinces will gain some more traction.
As a final treat, I give you some graphs per province.
While the inaugural Missing Maps mapathon in Edinburgh was a positive event, there are a few changes I'd make that I hope would improve the outcomes.
First, I'd make more effort to lead new mappers into the local community. I would emphasise that contributing to OpenStreetMap is not primarily about "armchair mapping", but about local surveying, adding to and maintaining the map of one's local area. There are regular meetups and mapping parties in many areas, and where there are not yet, a missing maps event presents a great chance to bootstrap a more active surveyor community.
I appreciate that in areas where the Humanitarian OpenStreetMap Team and The Missing Maps are focusing, there may not be local mappers with the leisure, infrastructure and bandwidth to contribute surveyed data to OSM, but it is almost always more work to correct poor armchair mapping than to build up from the ground.
Notes in progress:
NLS historical imagery, JOSM benefits of multi layers, iD and tag prescription, this was a technical audience and yes it was well balanced also
I had the fun experience last night of dropping in to an event organised by The Missing Maps. Unfortunately I couldn't make the whole thing, nor could most of the local mapper community members in Edinburgh, as we only found out that the event was happening the evening before :/
This seemed to be an accident of occupying parallel, not quite overlapping, social media universes. The main channel for OpenStreetMap Scotland news and events tends to be the OSM Alba Twitter feed and we organise mapping parties via the wiki, with the Scotland mailing list a fairly new addition to the communications tools.
Whereas The Missing Maps are organising via Facebook and eventbrite mostly. I don't know about you, but i tend to treat eventbrite as read-only, and I've historically refused to actively participate in Facebook. So we failed to overlap, and this wasn't helped by the University of Edinburgh-hosted venue moving at the last minute. So garycmartin attempted to drop by but went to the wrong venue and couldn't find a redirect. But the indefatigable stevefaeembra went along and represented well for the local mapper community.
I could only make the central 45 minutes, and got stuck on one question about recommended tags for roads, to which my answer was suffixed with "and this is my personal opinion, and there are many personal opinions in the OSM community".
I have a few thoughts to share about the experience of seeing newbie mappers get a copious and well-thought-out introduction to the iD editor, but i'll save those up for another diary entry. Despite the teething trouble with the venue and the community links, it was a worthwhile event, very well attended by a diverse looking group of people, many of whom turned out to be students on the University of Edinburgh's MSc GIS course, and i saw an old colleague from the EDINA datacentre there; always glad to see OSM bringing people together in unexpected ways.