Some of you might be interested in a small prototype that I coded during the past days. It’s an audio interface to select JOSM presets and here it is in action:
Well the idea of speech2josm is, that armchair-mapping is pretty intuitive. You trace outlines from aerial imagery and apply tags. But finding the right preset, fill in details to get the right OSM tags is very annoying. And beside all the tricks in JOSM (F3 search, autocomplete textboxes, toolbar-shortcuts, clone existing objects …) it seems to be the most interrupting and (in the mind of UX) expensive action while mapping. Add way, switch edit mode, add tags, switch select mode, …
To me it’s obvious, that an alternative input channel, could solve that bottleneck when you need to classify objects while editing. As I follow speech focused open technology since a few years, I picked the CMU sphinx offline speech-to-text engine, to integrate an audio controlled interface. I just wrote some glue-code to parse a list of control words and if triggered, it tells JOSM to add some tags. Currently, the accuracy is not that good, but a good starting point to get some help from folks that have more ecperience on STT. It’s great to see evolution of CMUsphinx tools and the efforts to make voice recognition on your own machine. While Amazon, Google, … tell you, that this works only in a cloud, this usecase is pretty fine to work completely offline.
If you like to improve the accuracy of a free (audio) language model, please speak a few sentences over at Mozilla Voice. It is a crowdsourcing of audio texts in different language, so a bit similar to OSM. But if you are interested to do some more intelligent discussions with your PC, you might want to checkout mycroft.ai and it’s skills that are coded in python as well.