OpenStreetMap

iboates's Diary

Recent diary entries

I am a developer at an energy research institute, and so the topic of electric vehicles and charging stations is one that comes up often in my discussions with colleagues. One colleague mentioned that they were struggling to do a study in a specific city but was lamenting how hard it was to find data for them. Naturally, I suggested using OSM.

While not outright shot down, the idea was politely dismissed, citing that the OSM data is simply too unreliable and not detailed enough to perform the kind of analysis that they wanted. My initial reaction was to jump to the defense of OSM, but I realized that I don’t really know the data quality of this specific corner of the database.

I was already planning to attend the Karlsruhe OSM hackacthon at Geofabrik on February 26th and 27th, and so I decided I would make it my mission to analyze the quality of the charging station data as deeply as I could in those two days (and some time thereafter). Obviously, as with any analysis of OSM quality, seemingly simple questions balloon into exponentially difficult answers, fraught with tedious subtleties.

Despite this, I have come to a few conclusions that I thought I should share that are specifically targeted at assessing the quality of OSM charging station data for use in electrical engineering research. First, here are some definitions I will use so as not to repeat tedious, specific technical definitions:

  • Charging station: An OSM feature tagged with “amentiy=charging_station”
  • Charging station point: Such a feature with a “node” geometry type
  • Charging station polygon: Such a feature with a “way” geometry type

I have also attached a dump of a PostGIS database which I was able to create using osm2pgsql (specifically the flex-output). I used this database in my analysis, and have written some queries in this post that are used to illustrate my points, when executed on an instance of this database. Special thanks to Jochen Topf and Sarah Hoffman for both developing these utilities and helping me directly in using them.

Confusion about capacity

It seems that there is a fair bit of confusion regarding the “capacity” tag on charging stations The confusion is not unfounded, however. “Capacity” in the context of electrical engineering refers to the maximum power of an output device. As a result, there are many (not a majority, but still many) charging stations on OSM that store this “electrical capacity” as opposed to the “people capacity” which is more standard for anything on OSM, and is defined quite clearly on the wiki.

To dig into this a bit, we can observe that the vast majority of charging stations are points. At the time of writing, there are 87 098 points and 930 ways (as well as 38 relations, but I did not investigate this any further since the wiki indicates that it should not be used on those) (source).

It is quite difficult to determine if a charging station has its power rating mistagged as its capacity, since there is no reliable way to determine the actual capacity without going to the charging station itself and counting the number of sockets.

Despite this, I have come up with a few criteria that I think that identify charging stations with capacity-related tagging issues.

Capacity is not completely numeric

The wiki is (at least at the time of writing) explicit on this:

“The number of vehicles that can be charged at the same time at a amenity=charging_station” (source)

In all cases, even outside of charging stations, it seems that this tag should be completely numeric. Any instance of a non-numeric character indicates that there is definitely something wrong, even if it is just a typo. All of the following cases, except the last one, can be see when executing this query:

select
    cs.node_id,
    cs.capacity as capacity
from
    charging_station cs
where
    cs.capacity !~ '^[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?$';

Specifying the power unit

This criteria catches a very common instance of this misconception, when a mapper has tagged the capacity with a number, followed by a power rating in kilowatts. Here is an example of this happening, although it takes many shapes and forms, with & without spaces, capitalizing either the “k” or the “w”, etc. To me, this is an immediately obvious as a mistake, and these values should be moved to the “socket::output” subtag.

Unnecessarily specifying the number of cars

Don’t be too hasty, because sometimes the mistake isn’t simply that the power rating has been assigned to the capacity. Some mappers have input the correct value, but have polluted the input with unnecessary extra content. For example, here is an instance of a mapper having indicated that the capacity is “2 Cars”, which is technically a correct usage of the tag, but there shouldn’t be any need to specify that it refers to cars, because it should be implicit.

Specifying both power and number of cars

There are also instances of the capacity being specified correctly, but the power rating of the charging station is just shoved into the capacity tag anyway. A really common occurrence of this is in Germany, where there are several charging stations with capacity tagged as “2 x 22 kW”. There are variants of this all over the world, it just happens more in Germany. An example of this is here.

Feature is a polygon (and is small)

According to the wiki, a polygonal charging station is perfectly valid, and it makes sense for cases where there is a high concentration of charging stations, perhaps because of a privately owned company proving many charging slots in a single charging location as a business model. In these cases, the polygon is likely to have a large area. An example of this is here, and we can be reasonably certain that this is a company due to the shape of the building around it (it looks like it is designed to fit many vehicles to maximize space efficiency. Unsurprisingly, the capacity is 28, a reasonable value for such a large area.

However, the polygon does not need to be large. Here is an example of the same company operating only six charging stations, packed into 6 adjacent parking spaces, and occupying a much smaller area as a result.

Problems arise however when we sort the list of polygonal charging stations by area (yes, calculated in 3857, I am well aware of the distortion issues, I only wanted to have an initial look. Proper analysis should certainly use a local, equal-area CRS). For example, this one is the smallest one that I found in the database, and it is so small that it barely even appears on the map. Thankfully, the parking spaces for it appear to actually be mapped, and the rest of the station seems to be properly tagged. So in this case, this feature could be easily re-mapped to be a point instead.

Another case of a problem is when a non-charging station lists a charging station as an amenity, probably the mapper mistakenly thinking that the tag value is intended to mean that “this place offers vehicle charging in addition to its regular function, like here, where a campground in the USA is tagged with “amenity=charging_station”. This particular instance doesn’t actually define capacity, but if it did, it would logically follow that the capacity should refer to the number of campsites, not to the number of charging stations. It’s unclear how to fix this. On the one hand, I wouldn’t want to scrub real information from this place, but on the other hand, it causes a disconnect between the charging station’s capacity and the campground capacity. This could be especially problematic if the number of charging stations is quite high, as a GIS analysis of charging stations could report a wildly incorrect assessment of capacity in that specific area, despite everything being mapped technically correctly.

To see the polygonal charging stations, you can use this query:

select
    cs.node_id,
    c.name as country,
    -– Make sure to use a proper, local equal-area CRS when doing your own detailed analysis!
    ST_Area(ST_Transform(cs.geom, 3857)) as area
from
    charging_station cs
    left join country c on ST_Intersects(c.geom, cs.geom)
where
    ST_Area(cs.geom) > 0

Point charging station capacity tag value is suspiciously high

I tried to isolate features, however, that are technically correct, but are still “suspicious”, in that I cannot say (to within a reasonable degree of certainty) that they are wrong, but I think it is reasonable to assume that any value greater than four is suspicious, if the feature is a point. My reasoning is this:

Given that OSM happily supports detail up to (standard web map) zoom level 20, it is reasonable to assume that charging station points are intended to represent an individual installation about the size of a vending machine or ATM. Those features are widely mapped as points. I posit that a charging station point is almost certainly meant to represent an installation of about the size of an ATM or a vending machine. Unless a charging station has extremely long cables, four is likely to be the maximum number of vehicles that it can simultaneously service when positioned at the intersection of the parking lines separating four parked cars in the typical double-columned parking layout that is popular in most of the world.

Maximum “reasonable” density for a charging station point (servicing four vehicles simultaneously)

While it is physically possible for this number to be higher (Perhaps there is some facility somewhere in the world in which cars park in rings around a central charging station), I consider it unlikely, and as such, I consider any “capacity” tag value greater than four to be suspicious and should be verified by a local mapper.

Again, I am not saying that it is significantly unlikely to have more than 4 sockets on a single device, only that it would be worth a local mapper taking a look to confirm that it is the case. Additionally, maybe it is worth re-mapping such cases as polygons to encompass the parking slots themselves, as was the case in some examples in the previous section.

You can get these “suspicious” charging stations with the following query:

select
    cs.node_id,
    c.name as country,
    cs.capacity::real as capacity
from
    charging_station cs
    left join country c on ST_Intersects(c.geom, cs.geom)
where
    cs.capacity ~ '^[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?$'
    and
    cs.capacity::real > 4
    and
    ST_Area(cs.geom) = 0

There are a lot more of these (>5000). Sadly I don’t think there is any way to realistically and reliably verify these. That is why I would propose a StreetComplete mapping campaign to attempt to clean them up.

Wrap-up

I think we can divide the findings into two problem groups.

The first group are the easy fixes. They consist of the first few cases:

  • Capacity tag value is not completely numeric
  • Power unit is specified in capacity tag value
  • Capacity tag value species “cars” or equivalent
  • Both power and number of sockets are specified in capacity tag value

I have added a link to a .csv file containing all the OSM ids of features that have one (or more of these problems), as I believe that at least most of them can be fixed by a dedicated mapper (or mappers), even without going to the charging station directly. Here is the .csv file

The second group are the hard fixes. They consist of the last two cases:

  • Feature is a polygon (and is small)
  • Point charging station capacity is suspiciously high

There are definitely some totally correctly-tagged features in here, but I believe that there are some systematic problems mixed up in them that could stand to be verified by an on-site mapper, perhaps via a StreetComplete campaign. I have also added a list of the OSM ids of features with these potential problems, but keep in mind that fixing them will almost certainly require visiting and verifying the charging station. Here is the link

Finally, I have also added a link to a PostgreSQL database dump (~40MB compressed) of all charging stations worldwide, if anyone wants to dive in deeper.

(This article was edited on 15.03.2024 to fix dead github links)