Simple demo app that analyses OSM changeset data (and why modern file formats are cool)
Posted by tomczk on 31 January 2023 in English.Recently I found Streamlit which is a pretty cool Python library that makes it easy to create web apps for visualising data.
I converted changeset dump from planet.osm.org to Parquet file format and uploaded it to AWS S3 storage. Then created this streamlit app in their free cloud: https://ttomasz-tt-osm-changeset-analyzer-main-apdkpy.streamlit.app/ which displays some basic statistics.
The app leverages the power of DuckDB, a database engine that can query these files over internet on demand. Parquet files, which are a popular format in modern cloud data lakes, have several advantages over traditional file formats. They are column-oriented, compressed, and support range requests, which means that you can download only the portion of the file you need, instead of having to go through the entire file, making processing larger datasets much faster.
DuckDB works similarly to SQLite in that it doesn’t have a dedicated server. You run the queries locally [0]. This makes the setup super simple you either install the binary or configure connection in IDE like DBeaver and you can run SQL queries.
Running these simple SQL queries over remote Parquet files takes about a minute or two. Trying to do the same with a custom script on raw changesets.xml.bz2
file would run longer not to mention that the effort to prepare the code would be much much larger.
It would be great if OSM hosted more “consumption ready” data instead of relying on users to do their own coding and parsing.
Let me know if you have some ideas for charts/tables that could be added to the demo.
[0] - well in this case they are running on streamlit cloud’s server but you can run the queries locally on the same parquet files easily