OpenStreetMap logo OpenStreetMap

In the beginning of the year, I’ve been refactoring the Divide and map. Now.. I’ve already wrote about the clients and about the improvements. In this diary, I would like to share some details about how the server looks like now.

The damn project distributed architecture is depicted below. I’ll write about the api and db bubbles, where db is the PostgreSQL database with the PostGIS extension and the api is the python FastAPI application.

damn project architecture

Damn server API

Automatically generated documentation of the API is useful for the developers of clients. There are some not so much interesting endpoints dealing with authentication and authorization, users updates, or squares GPXs. I will elaborate on /areas, /geometries, /area/AID, /area/AID/commits, and /area/AID/squares endpoints here.

The list of areas and adding new areas is done through /areas endpoint. However, returned area is different from the area to be created. Returned area contains statistics, area to be created contains FeatureCollection with the area’s geometry and the information about how to divide the area.

The /geometries endpoint is used in python client, particularly in the script detecting intersecting areas.

Getting information about an area and updating the area information is done through /area/AID endpoint, where AID is the area identifier. Currently, four digits are used for the area identifier. Yes, the number of areas of the damn project is limited to 8 999. I’m going to increase the limit to five digits – 89 999 – when needed.

There is /area/AID/squares endpoint providing the list of squares’ geometries for the area. Nothing more.

The endpoint where the real work is being done is /area/AID/commits. The idea is to get the area’s commits once, store the commits, and use ?since= query parameter in the next query to get just new commits. (JavaScript damn API library may help here.) Getting the commits helps to decide the square’s current state (e.g. to map or done,) compute statistics, or show the history of the area’s information.

It is not possible to change the commit. It’s only possible to add new one. And creating the commit is as simple as sending {"type": "map random"} to the server. The example says to the server: “Let me map some random square”. The server replies: “201: Created. Here is the area’s ID, square’s ID, and the square’s border.” Commit message is optional. It’s currently possible to map recent, random, or nearest square, to review recent, random, nearest, or newbie square, and manually lock some square specified by the square’s ID. When unlocking the square, send the type of needs mapping, needs review, is done, or split along with the square’s ID – it’s compulsory when unlocking. However, there is one exception. When multiple squares are locked, commit message with the merge type will create the new square from all the locked squares. There is no need for square’s ID in that case. (Similar to areas, returned commit has different set of types than the commit to be created.)

Source code structure

I’ve changed the files structure, too. However, it’s probably not interesting much. conf and db are unchanged. I’ve added Pydantic models and docstrings to user file. square just gets the border from the database and returns it as GeoJSON or GPX.

api is the FastAPI file. This file is used as the MODULE_NAME of the uvicorn docker. As soon as the FastAPI loses to be the number one of the Python JSON API frameworks, just change this file.

I’ve only added Pydantic models and some docstrings to the area file. save, load, and update functions are still here.

I’ve moved all the lists of any kind to the list_of file. Database queries for the list of areas, geometries, area commits and squares, and user commits are stored here.

The last file is the new file. This file deals with the database queries for creating the commits. The original idea was to move the queries that creates something new in the database. However, when you recall the API, only new area or commit is created. And it’s just cleaner to area.save, in my opinion.

SQL queries

The database structure is not changed. Just to recall:

current_areas -- stores area information
- aid -- primary key
- tags -- the default changeset comment
- priority -- how to order the areas
- description -- JSON with {"lang code": "description", ...}
- instructions -- JSON with {"what": "link to wiki", ...}
- featurecollection -- GeoJSON
- created -- automatically for new areas, UTC now()

curret_squares -- stores area's squares
- sid -- along with aid primary key
- aid -- foreign key to current_commits
- border -- geometry

current_commits -- stores area's commits
- cid -- primary key
- sid -- along with AID foreign key to current_squares
- aid -- foreign key to current_commits
- date -- automatically for new commits, UTC now()
- author -- foreign key for users.display_name
- type -- to map, to review, done, locked, ...
- message -- author's note

I’m going to share the interesting parts (in my opinion) of the queries. The complete code of the server and the database is available, of course.

Let’s start with guessing. What does the following query do?

WITH last AS (
    SELECT DISTINCT ON (sid) cid, sid, type
    FROM current_commits
    WHERE aid=$1
    ORDER BY sid, cid DESC
),
toreview AS (
    SELECT *
    FROM last
    WHERE type='to review'
    ORDER BY cid DESC
    LIMIT 1
)
INSERT INTO current_commits (sid, aid, author, type, message)
SELECT sid, $1, $2, 'locked', 'Working on review'
FROM toreview
RETURNING sid

This is non-refactored code, in fact. It just locks the recent square for review. Locking of the random square is RANDOM() insted of cid DESC, you probably know.

To use some PostGIS, let’s lock the neares square, hey?

WITH last AS (
    SELECT DISTINCT ON (cc.sid)
        cid, cc.sid, cc.aid, type, author, border
    FROM current_commits AS cc
    INNER JOIN current_squares AS cs
    ON (cc.aid=cs.aid AND cc.sid=cs.sid)
    WHERE cc.aid=$1
    ORDER BY cc.sid, cid DESC
),
mine AS (
    SELECT border
    FROM current_commits AS cc
    INNER JOIN current_squares AS cs
    ON (cc.aid=cs.aid AND cc.sid=cs.sid)
    WHERE cc.aid=$1
    AND author=$2
    ORDER BY cid DESC
    LIMIT 1
),
toreview AS (
    SELECT sid, cid, type
    FROM last, mine
    WHERE type='to review'
    ORDER BY ST_Distance(mine.border, last.border)
    LIMIT 1
)
INSERT INTO current_commits (sid, aid, author, type, message)
SELECT sid, $1, $2, 'locked', 'Working on review'
FROM toreview
RETURNING sid

Border is stored in current_squares table so it has to be joined with the current_commits. The interesting part is:

ORDER BY ST_Distance(mine.border, last.border)

And the interesting part for the newbie’s square review request is:

WHERE type='to review'
AND (info::json->>'newbie')::timestamp > now()
ORDER BY RANDOM()

The mappers mark newbies themselves as I’ve explained in older diary. In the same diary, I’ve mentioned the merging of the squares. However, I didn’t say details. It’s not the rocket science. The main part is done by PostGIS again – the ST_Union function in this case.

...
to_be_merged AS (
    SELECT border
    FROM current_squares cs, locked lo
    WHERE cs.sid=lo.sid
    AND cs.aid=lo.aid
),
merged AS (
    SELECT ST_UNION(border) as mb
    FROM to_be_merged
)
INSERT INTO current_squares (sid, aid, border)
SELECT f.msid + row_number() OVER () as sid, f.maid as aid, mb
FROM
    (
        SELECT MAX(cs.sid) as msid, MAX(cs.aid) as maid
        FROM current_squares cs, locked lo
        WHERE cs.aid=lo.aid
    ) as f,
    merged
RETURNING sid, aid

Here, the locked is query that returns all the squares locked by the author. (Note that the locked information is stored in current_commits, but the border in current_squares. Also, some parts of the code are pointless.)

The best for the last. I’ve refactored the pgsql function for dividing up an area or square – the st_divide function. No code snippet here. Rather look at the whole source code. The function gets area geometry, number of squares in x and y axis, and returns a set of the divided area’s geometries. (The number of squares in x and y is overloaded by square’s dx and dy dimensions.) I’ve also tuned the squares a little bit, so it’s possible to divide up the area to rectangles, diamonds, flat hexagons, pointy hexagons, or the wall bricks.

Conclusion

The thing I like the most about the server from the API point of view is that the whole mapping workflow – map -> review -> done – is available through the one endpoint with the trivial messages of {"sid": 1, "type": "is done"}.

The thing I like the most about the server from the database point of view is that everything is done by the SQL queries – the python application “just” translates JSON to SQL, in fact.

Discussion

Log in to leave a comment