sliding-sync/README.md
Kegan Dougal 233d21ad2e Type switch payload types; add Prometheus instructions
The type names should make it self-explanatory what kinds of
payloads are being processed.
2022-12-16 10:52:08 +00:00

4.9 KiB

GitHub branch checks state

sync-v3

Run an experimental sync v3 server using an existing Matrix account. This is possible because, for the most part, v3 sync is a strict subset of v2 sync.

An implementation of MSC3575.

Proxy version to MSC API specification:

  • Version 0.1.x: 2022/04/01
    • First release
  • Version 0.2.x: 2022/06/09
    • Reworked where lists and ops are situated in the response JSON. Added new filters like room_name_like. Added slow_get_all_rooms. Standardised on env vars for configuring the proxy. Persist access tokens, encrypted with SYNCV3_SECRET.
  • Version 0.3.x: 2022/08/05
    • Spaces support, txn_id support.
  • Version 0.4.x 2022/08/23
    • Support for tags and not_tags.

Usage

Requires Postgres 13+.

$ createdb syncv3
$ echo -n "$(openssl rand -hex 32)" > .secret # this MUST remain the same throughout the lifetime of the database created above.
$ go build ./cmd/syncv3
$ SYNCV3_SECRET=$(cat .secret) SYNCV3_SERVER="https://matrix-client.matrix.org" SYNCV3_DB="user=$(whoami) dbname=syncv3 sslmode=disable" SYNCV3_BINDADDR=0.0.0.0:8008 ./syncv3

Then visit http://localhost:8008/client/ (with trailing slash) and paste in the access_token for any account on -server.

When you hit the Sync button nothing will happen initially, but you should see:

INF Poller: v2 poll loop started ip=::1 since= user_id=@kegan:matrix.org

Wait for the first initial v2 sync to be processed (this can take minutes!) and then v3 APIs will be responsive.

Prometheus

To enable metrics, pass SYNCV3_PROM=:2112 to listen on that port and expose a scraping endpoint GET /metrics. If you want to hook this up to a prometheus, you can just define prometheus.yml:

global:
    scrape_interval: 30s
    scrape_timeout: 10s
scrape_configs:
    - job_name: ss
      static_configs:
       - targets: ["host.docker.internal:2112"]

then run Prometheus in a docker container:

docker run -p 9090:9090 -v /path/to/prometheus.yml:/etc/prometheus/prometheus.yml prom/prometheus

to play with the data, use PromLens and point it at http://localhost:9090:

docker run -p 8080:8080 prom/promlens

Useful queries include:

  • rate(sliding_sync_poller_num_payloads{job="ss"}[5m]) : This shows the payload rate from pollers to API processes, broken down by type. A stacked graph display is especially useful as the height then represents the total payload rate. This can be used to highlight abnormal incoming data, such as excessive payload rates. It can also be used to gauge how costly certain payload types are. In general, receipts and device data tend to be the most frequent background noise. A full list of payload types are defined in the pubsub directory.
  • sliding_sync_poller_num_pollers : Absolute count of the number of /sync v2 pollers over time. Useful either as a single value, or display over time. The higher this value, the more pressure is put on the upstream Homeserver.
  • sliding_sync_api_num_active_conns : Absolute count of the number of active sliding sync connections. Useful either as a single value, or display over time. The higher this value, the more pressure is put on the proxy API processes.
  • sum(increase(sliding_sync_poller_process_duration_secs_bucket[1m])) by (le) : Useful heatmap to show how long /sync v2 responses take to process. This can highlight database pressure as processing responses involves database writes and notifications over pubsub.
  • sum(increase(sliding_sync_api_process_duration_secs_bucket[1m])) by (le) : Useful heatmap to show how long sliding sync responses take to calculate, which excludes all long-polling requests. This can highlight slow sorting/database performance, as these requests should always be fast.

How can I help?

At present, the best way to help would be to run a local v3 server pointed at a busy account and just leave it and a client running in the background. Look at it occasionally and submit any issues you notice. You can save console logs by right-clicking -> Save As.

Please run the server with SYNCV3_DEBUG=1 set. This will force the server to panic when assertions fail rather than just log them.