In the beginning of the year, I’ve been refactoring the Divide and map. Now.. I’ve already wrote about the clients and about the improvements. In this diary, I would like to share some details about how the server looks like now.
The damn project distributed architecture is depicted below. I’ll write about
the api
and db
bubbles, where db
is the
PostgreSQL database with the
PostGIS extension and the api
is the python
FastAPI application.
Damn server API
Automatically generated documentation
of the API is useful for the developers of clients. There are some not so much
interesting endpoints dealing with authentication and authorization, users
updates, or squares GPXs. I will elaborate on /areas
, /geometries
,
/area/AID
, /area/AID/commits
, and /area/AID/squares
endpoints here.
The list of areas and adding new areas is done through /areas
endpoint.
However, returned area is different from the area to be created. Returned area
contains statistics, area to be created contains FeatureCollection
with the
area’s geometry and the information about how to divide the area.
The /geometries
endpoint is used in python
client, particularly in the script
detecting intersecting areas.
Getting information about an area and updating the area information is done
through /area/AID
endpoint, where AID
is the area identifier. Currently,
four digits are used for the area identifier. Yes, the number of areas of the
damn project is limited to 8 999. I’m going to increase the limit to five
digits – 89 999 – when needed.
There is /area/AID/squares
endpoint providing the list of squares’ geometries
for the area. Nothing more.
The endpoint where the real work is being done is /area/AID/commits
. The idea
is to get the area’s commits once, store the commits, and use ?since=
query
parameter in the next query to get just new commits. (JavaScript damn API
library
may help here.) Getting the commits helps to decide the square’s current state
(e.g. to map or done,) compute statistics, or show the history of the
area’s information.
It is not possible to change the commit. It’s only possible to add new one. And
creating the commit is as simple as sending {"type": "map random"}
to the
server. The example says to the server: “Let me map some random square”. The
server replies: “201: Created. Here is the area’s ID, square’s ID, and the
square’s border.” Commit message is
optional. It’s currently possible to map recent, random, or nearest
square, to review recent, random, nearest, or newbie square, and
manually lock some square specified by the square’s ID. When unlocking the
square, send the type of needs mapping, needs review, is done, or split
along with the square’s ID – it’s compulsory when unlocking. However, there is
one exception. When multiple squares are locked, commit message with the
merge type will create the new square from all the locked squares. There is
no need for square’s ID in that case. (Similar to areas, returned commit has
different set of types than the commit to be created.)
Source code structure
I’ve changed the files structure, too. However, it’s probably not interesting
much. conf
and db
are unchanged. I’ve added Pydantic models and docstrings
to user
file. square
just gets the border from the database and returns it
as GeoJSON or GPX.
api
is the FastAPI file. This file is used as the MODULE_NAME
of the
uvicorn docker.
As soon as the FastAPI loses to be the number one of the Python JSON API
frameworks, just change this file.
I’ve only added Pydantic models and some docstrings to the area
file.
save
, load
, and update
functions are still here.
I’ve moved all the lists of any kind to the list_of
file. Database queries
for the list of areas, geometries, area commits and squares, and user commits
are stored here.
The last file is the new
file. This file deals with the database queries for
creating the commits. The original idea was to move the queries that creates
something new in the database. However, when you recall the API, only new area
or commit is created. And it’s just cleaner to area.save
, in my opinion.
SQL queries
The database structure is not changed. Just to recall:
current_areas -- stores area information
- aid -- primary key
- tags -- the default changeset comment
- priority -- how to order the areas
- description -- JSON with {"lang code": "description", ...}
- instructions -- JSON with {"what": "link to wiki", ...}
- featurecollection -- GeoJSON
- created -- automatically for new areas, UTC now()
curret_squares -- stores area's squares
- sid -- along with aid primary key
- aid -- foreign key to current_commits
- border -- geometry
current_commits -- stores area's commits
- cid -- primary key
- sid -- along with AID foreign key to current_squares
- aid -- foreign key to current_commits
- date -- automatically for new commits, UTC now()
- author -- foreign key for users.display_name
- type -- to map, to review, done, locked, ...
- message -- author's note
I’m going to share the interesting parts (in my opinion) of the queries. The complete code of the server and the database is available, of course.
Let’s start with guessing. What does the following query do?
WITH last AS (
SELECT DISTINCT ON (sid) cid, sid, type
FROM current_commits
WHERE aid=$1
ORDER BY sid, cid DESC
),
toreview AS (
SELECT *
FROM last
WHERE type='to review'
ORDER BY cid DESC
LIMIT 1
)
INSERT INTO current_commits (sid, aid, author, type, message)
SELECT sid, $1, $2, 'locked', 'Working on review'
FROM toreview
RETURNING sid
This is non-refactored code, in fact. It just locks the recent square for
review. Locking of the random square is RANDOM()
insted of cid DESC
, you
probably know.
To use some PostGIS, let’s lock the neares square, hey?
WITH last AS (
SELECT DISTINCT ON (cc.sid)
cid, cc.sid, cc.aid, type, author, border
FROM current_commits AS cc
INNER JOIN current_squares AS cs
ON (cc.aid=cs.aid AND cc.sid=cs.sid)
WHERE cc.aid=$1
ORDER BY cc.sid, cid DESC
),
mine AS (
SELECT border
FROM current_commits AS cc
INNER JOIN current_squares AS cs
ON (cc.aid=cs.aid AND cc.sid=cs.sid)
WHERE cc.aid=$1
AND author=$2
ORDER BY cid DESC
LIMIT 1
),
toreview AS (
SELECT sid, cid, type
FROM last, mine
WHERE type='to review'
ORDER BY ST_Distance(mine.border, last.border)
LIMIT 1
)
INSERT INTO current_commits (sid, aid, author, type, message)
SELECT sid, $1, $2, 'locked', 'Working on review'
FROM toreview
RETURNING sid
Border is stored in current_squares
table so it has to be joined with the
current_commits
. The interesting part is:
ORDER BY ST_Distance(mine.border, last.border)
And the interesting part for the newbie’s square review request is:
WHERE type='to review'
AND (info::json->>'newbie')::timestamp > now()
ORDER BY RANDOM()
The mappers mark newbies themselves as I’ve explained in older
diary. In the same
diary, I’ve mentioned the merging of the squares. However, I didn’t say
details. It’s not the rocket science. The main part is done by PostGIS again –
the ST_Union
function in this case.
...
to_be_merged AS (
SELECT border
FROM current_squares cs, locked lo
WHERE cs.sid=lo.sid
AND cs.aid=lo.aid
),
merged AS (
SELECT ST_UNION(border) as mb
FROM to_be_merged
)
INSERT INTO current_squares (sid, aid, border)
SELECT f.msid + row_number() OVER () as sid, f.maid as aid, mb
FROM
(
SELECT MAX(cs.sid) as msid, MAX(cs.aid) as maid
FROM current_squares cs, locked lo
WHERE cs.aid=lo.aid
) as f,
merged
RETURNING sid, aid
Here, the locked
is query that returns all the squares locked by the
author
. (Note that the locked information is stored in current_commits
, but
the border in current_squares
. Also, some parts of the code are pointless.)
The best for the last. I’ve refactored the pgsql function for dividing up an
area or square – the st_divide
function. No code snippet here. Rather look
at the whole source
code.
The function gets area geometry, number of squares in x
and y
axis, and
returns a set of the divided area’s geometries. (The number of squares in x
and y
is overloaded by square’s dx
and dy
dimensions.) I’ve also tuned
the squares a little bit, so it’s possible to divide up the area to
rectangles, diamonds, flat hexagons, pointy hexagons, or the wall
bricks.
Conclusion
The thing I like the most about the server from the API point of view is that
the whole mapping workflow – map -> review -> done – is available through
the one endpoint with the trivial messages of {"sid": 1, "type": "is done"}
.
The thing I like the most about the server from the database point of view is that everything is done by the SQL queries – the python application “just” translates JSON to SQL, in fact.
Discussion