16 Commits

Author SHA1 Message Date
Kegan Dougal
c47665f1e8 Actually honour max conns globally, not per storage struct 2023-07-12 17:36:59 +01:00
Kegan Dougal
4c661fbdd1 Add db conns test; uncomment DBMaxConns to break the world 2023-06-19 15:56:22 +01:00
Kegan Dougal
cb15252967 Set sensible DB conn limits 2023-06-14 10:24:25 +01:00
David Robertson
fb833d74b2
Remove extra log var in sync2 2023-05-02 12:09:20 +01:00
David Robertson
db32a58428
Cleanup devices table 2023-04-28 18:50:43 +01:00
David Robertson
e71d954030
Introduce tokens table 2023-04-28 01:37:45 +01:00
David Robertson
b91dbae895
Introduce a sync2.Storage struct 2023-04-27 20:43:06 +01:00
David Robertson
c4a2984cba
Rename Storage to DevicesTable 2023-04-27 19:07:10 +01:00
David Robertson
601e3fce49
More sentry logging 2023-04-13 15:02:46 +01:00
Kegan Dougal
6bdef5feba bugfix: expire connections when the access token gets invalidated
With regression test. The behaviour is:
 - Delete the connection, such that incoming requests will end up with M_UNKNOWN_POS
 - The next request will then return HTTP 401.

This has knock-on effects:
 - We no longer send HTTP 502 if /whoami returns 401, instead we return 401.
 - When the token is expired (pollers get 401, the device is deleted from the DB).
2023-03-01 16:40:15 +00:00
Kegan Dougal
aa28df161c Rename package -> github.com/matrix-org/sliding-sync 2022-12-15 11:08:50 +00:00
Kegan Dougal
be8543a21a add extensions for typing and receipts; bugfixes and additional perf improvements
Features:
 - Add `typing` extension.
 - Add `receipts` extension.
 - Add comprehensive prometheus `/metrics` activated via `SYNCV3_PROM`.
 - Add `SYNCV3_PPROF` support.
 - Add `by_notification_level` sort order.
 - Add `include_old_rooms` support.
 - Add support for `$ME` and `$LAZY`.
 - Add correct filtering when `*,*` is used as `required_state`.
 - Add `num_live` to each room response to indicate how many timeline entries are live.

Bug fixes:
 - Use a stricter comparison function on ranges: fixes an issue whereby UTs fail on go1.19 due to change in sorting algorithm.
 - Send back an `errcode` on HTTP errors (e.g expired sessions).
 - Remove `unsigned.txn_id` on insertion into the DB. Otherwise other users would see other users txn IDs :(
 - Improve range delta algorithm: previously it didn't handle cases like `[0,20] -> [20,30]` and would panic.
 - Send HTTP 400 for invalid range requests.
 - Don't publish no-op unread counts which just adds extra noise.
 - Fix leaking DB connections which could eventually consume all available connections.
 - Ensure we always unblock WaitUntilInitialSync even on invalid access tokens. Other code relies on WaitUntilInitialSync() actually returning at _some_ point e.g on startup we have N workers which bound the number of concurrent pollers made at any one time, we need to not just hog a worker forever.

Improvements:
 - Greatly improve startup times of sync3 handlers by improving `JoinedRoomsTracker`: a modest amount of data would take ~28s to create the handler, now it takes 4s.
 - Massively improve initial initial v3 sync times, by refactoring `JoinedRoomsTracker`, from ~47s to <1s.
 - Add `SlidingSyncUntil...` in tests to reduce races.
 - Tweak the API shape of JoinedUsersForRoom to reduce state block processing time for large rooms from 63s to 39s.
 - Add trace task for initial syncs.
 - Include the proxy version in UA strings.
 - HTTP errors now wait 1s before returning to stop clients tight-looping on error.
 - Pending event buffer is now 2000.
 - Index the room ID first to cull the most events when returning timeline entries. Speeds up `SelectLatestEventsBetween` by a factor of 8.
 - Remove cancelled `m.room_key_requests` from the to-device inbox. Cuts down the amount of events in the inbox by ~94% for very large (20k+) inboxes, ~50% for moderate sized (200 events) inboxes. Adds book-keeping to remember the unacked to-device position for each client.
2022-12-14 18:53:55 +00:00
Kegan Dougal
976875ba7a Skip unreadable access tokens 2022-07-20 11:37:26 +01:00
Kegan Dougal
47b74a6be6 Automatically start v2 pollers on startup
We can do this now because we store the access token for each device.

Throttled at 16 concurrent sync requests to avoid causing
thundering herds on startup.
2022-07-14 10:48:45 +01:00
Kegan Dougal
ed9e9ed48c Persist v2 access tokens in the database, encrypted
- Add `SYNCV3_SECRET` env var which is SHA256'd and used as an AES
  key to encrypt/decrypt tokens.
- Add column `v2_token_encrypted` to `syncv3_sync2_devices`
- Update unit tests to check encryption/decryption work.

This provides an extra layer of security in case the database is
compromised and real user access tokens are leaked. This forces
an attacker to obtain both the database table _and_ the secret
env var (which will typically be stored in secure storage e.g
k8s secrets). Unfortunately, we need to have the access_token
in the plain so we cannot rely on password-style storage algorithms
like bcrypt/scrypt, which would be safer.
2022-07-13 17:03:40 +01:00
Kegan Dougal
c893efae14 Factor out sync2 since token storage 2021-09-20 18:09:28 +01:00