80 Commits

Author SHA1 Message Date
David Robertson
587c7fc2cf
Use the right name for "context" 2023-04-05 18:48:12 +01:00
David Robertson
2da72f9c44
Make assertions report to sentry 2023-04-05 18:24:01 +01:00
David Robertson
32d482edd3
Maybe this fixes the segfault? 2023-04-05 17:14:47 +01:00
Kegan Dougal
5de7fa72f6 Read TraceContext headers to get full client/server spans 2023-02-21 10:50:39 +00:00
Kegan Dougal
1d64febf49 metrics: add Jaeger spans by setting SYNCV3_JAEGER_URL
This is a WIP but is mostly there. Jaeger debug logging goes
to the wrong logger currently (e.g if you enter an invalid URL).
2023-02-20 17:55:34 +00:00
Kegan Dougal
6ee1b5244f Add missing file 2023-02-20 14:58:38 +00:00
Kegan Dougal
fbc49564c9 If no required_state is sent; don't pull out all room state 2023-01-13 19:12:36 +00:00
Kegan Dougal
6c4f7d3722 improvement: completely refactor device data updates
- `Conn`s now expose a direct `OnUpdate(caches.Update)` function
  for updates which concern a specific device ID.
- Add a bitset in `DeviceData` to indicate if the OTK or fallback keys were changed.
- Pass through the affected `DeviceID` in `pubsub.V2DeviceData` updates.
- Remove `DeviceDataTable.SelectFrom` as it was unused.
- Refactor how the poller invokes `OnE2EEData`: it now only does this if
  there are changes to OTK counts and/or fallback key types and/or device lists,
  and _only_ sends those fields, setting the rest to the zero value.
- Remove noisy logging.
- Add `caches.DeviceDataUpdate` which has no data but serves to wake-up the long poller.
- Only send OTK counts / fallback key types when they have changed, not constantly. This
  matches the behaviour described in MSC3884

The entire flow now looks like:
- Poller notices a diff against in-memory version of otk count and invokes `OnE2EEData`
- Handler updates device data table, bumps the changed bit for otk count.
- Other handler gets the pubsub update, directly finds the `Conn` based on the `DeviceID`.
  Invokes `OnUpdate(caches.DeviceDataUpdate)`
- This update is handled by the E2EE extension which then pulls the data out from the database
  and returns it.
- On initial connections, all OTK / fallback data is returned.
2022-12-22 15:08:42 +00:00
Kegan Dougal
be8543a21a add extensions for typing and receipts; bugfixes and additional perf improvements
Features:
 - Add `typing` extension.
 - Add `receipts` extension.
 - Add comprehensive prometheus `/metrics` activated via `SYNCV3_PROM`.
 - Add `SYNCV3_PPROF` support.
 - Add `by_notification_level` sort order.
 - Add `include_old_rooms` support.
 - Add support for `$ME` and `$LAZY`.
 - Add correct filtering when `*,*` is used as `required_state`.
 - Add `num_live` to each room response to indicate how many timeline entries are live.

Bug fixes:
 - Use a stricter comparison function on ranges: fixes an issue whereby UTs fail on go1.19 due to change in sorting algorithm.
 - Send back an `errcode` on HTTP errors (e.g expired sessions).
 - Remove `unsigned.txn_id` on insertion into the DB. Otherwise other users would see other users txn IDs :(
 - Improve range delta algorithm: previously it didn't handle cases like `[0,20] -> [20,30]` and would panic.
 - Send HTTP 400 for invalid range requests.
 - Don't publish no-op unread counts which just adds extra noise.
 - Fix leaking DB connections which could eventually consume all available connections.
 - Ensure we always unblock WaitUntilInitialSync even on invalid access tokens. Other code relies on WaitUntilInitialSync() actually returning at _some_ point e.g on startup we have N workers which bound the number of concurrent pollers made at any one time, we need to not just hog a worker forever.

Improvements:
 - Greatly improve startup times of sync3 handlers by improving `JoinedRoomsTracker`: a modest amount of data would take ~28s to create the handler, now it takes 4s.
 - Massively improve initial initial v3 sync times, by refactoring `JoinedRoomsTracker`, from ~47s to <1s.
 - Add `SlidingSyncUntil...` in tests to reduce races.
 - Tweak the API shape of JoinedUsersForRoom to reduce state block processing time for large rooms from 63s to 39s.
 - Add trace task for initial syncs.
 - Include the proxy version in UA strings.
 - HTTP errors now wait 1s before returning to stop clients tight-looping on error.
 - Pending event buffer is now 2000.
 - Index the room ID first to cull the most events when returning timeline entries. Speeds up `SelectLatestEventsBetween` by a factor of 8.
 - Remove cancelled `m.room_key_requests` from the to-device inbox. Cuts down the amount of events in the inbox by ~94% for very large (20k+) inboxes, ~50% for moderate sized (200 events) inboxes. Adds book-keeping to remember the unacked to-device position for each client.
2022-12-14 18:53:55 +00:00
Kegan Dougal
2b4f3a8bc2 Log more information in responses 2022-09-08 09:42:16 +01:00
Kegan Dougal
bfbccb045a Make the rest of the proxy aware of the upgraded room ID rather than just is_tombstoned 2022-09-07 17:44:04 +01:00
Kegan Dougal
7e7a8a98ce feat/bugfix: Add invited|joined_count to room response JSON
This is so clients can accurately calculate the push rule:
```
{"kind":"room_member_count","is":"2"}
```
Also fixed a bug in the global room metadata for the joined/invited
counts where it could be wrong because of Synapse sending duplicate
join events as we were tracking +-1 deltas. We now calculate these
counts based on the set of user IDs in a specific membership state.
2022-08-30 17:27:58 +01:00
Kegan Dougal
a37aee4c2b Improve logging; remove useless fields 2022-08-16 14:23:05 +01:00
Kegan Dougal
59cddd08c7 bugfix: update the 'name' field on rooms when relevant actions occur
Relevant actions include:
 - People joining/leaving a room
 - An m.room.name or m.room.canonical_alias event is sent
 - etc..

Prior to this, we just set the room name field for initial=true
rooms only. This meant that if a room name was updated whilst it was
in the visible range (or currently subscribed to), we wouldn't set
this field resulting in stale names for clients. This was particularly
prominent when you created a room, as the initial member event would
cause the room to appear in the list as "Empty room" which then would
never be updated even if there was a subsequent `m.room.name` event
sent.

Fixed with regression tests.
2022-08-11 15:07:36 +01:00
Kegan Dougal
5ca156afe9 spaces: synchronise space updates between global/user caches
Add request filter for spaces(!)
2022-07-29 15:19:20 +01:00
Kegan Dougal
ebe9767f13 BREAKING[sql]: add type column to rooms table 2022-07-27 11:25:40 +01:00
Kegan Dougal
ed9e9ed48c Persist v2 access tokens in the database, encrypted
- Add `SYNCV3_SECRET` env var which is SHA256'd and used as an AES
  key to encrypt/decrypt tokens.
- Add column `v2_token_encrypted` to `syncv3_sync2_devices`
- Update unit tests to check encryption/decryption work.

This provides an extra layer of security in case the database is
compromised and real user access tokens are leaked. This forces
an attacker to obtain both the database table _and_ the secret
env var (which will typically be stored in secure storage e.g
k8s secrets). Unfortunately, we need to have the access_token
in the plain so we cannot rely on password-style storage algorithms
like bcrypt/scrypt, which would be safer.
2022-07-13 17:03:40 +01:00
Kegan Dougal
59f5956e0e Use new RequiredStateMap when pulling out room state 2022-05-26 10:30:53 +01:00
Kegan Dougal
bbfaf10d10 Add room filter for is_tombstoned 2022-03-22 14:56:57 +00:00
Kegan Dougal
0dff964705 Batch together updates
Previously when live streaming you could only get 1 update per request.
We now batch them up until there are no more incoming events.
2021-12-15 09:51:25 +00:00
Kegan Dougal
33cf1542aa Add Assert() function which works like C assert()
Particularly as the server expands into multiple lists and
filters, having a way to quickly detect off-by-one index
errors is important, so add an assert() function which
will panic() if SYNCV3_DEBUG=1 else log an angry message.
2021-11-08 13:04:03 +00:00
Kegan Dougal
47a658b289 Track encrypted status per-room in RoomMetadata 2021-11-04 16:23:44 +00:00
Kegan Dougal
31a51d0ed5 room names: fix a bug which produced incorrect names due to display name changes
Add regression tests for this.
2021-11-02 17:55:31 +00:00
Kegan Dougal
b1cc39644f room names: use the user ID as the name if there is no displayname 2021-11-02 16:16:42 +00:00
Kegan Dougal
9a515c5b84 sorting: implement by_name
Mostly works, few edge cases remain. Tests outstanding.
2021-10-29 15:00:20 +01:00
Kegan Dougal
26ed9b9a40 Merge SortableRoom and HeroInfo into RoomMetadata
RoomMetadata stores the current invite/join count, heroes for the
room, most recent timestamp, name event content, canonical alias, etc

This information is consistent across all users so can be globally
cached for future use. Make ConnState call CalculateRoomName with
RoomMetadata to run the name algorithm.

This is *almost* complete but as there are no Heroes yet in the
metadata, things don't quite render correctly yet.
2021-10-27 18:16:43 +01:00
Kegan Dougal
594723e0c6 Add HeroInfo and stub storage methods 2021-10-26 18:22:27 +01:00
Kegan Dougal
5ee1e422a1 Implement the room name calculation algorithm 2021-10-08 17:24:06 +01:00
Kegan Dougal
2b2e4493e2 Add synclive handler
This does nothing but pass v2 responses to the storage layer for v2 at present
2021-09-20 18:20:07 +01:00
Kegan Dougal
c893efae14 Factor out sync2 since token storage 2021-09-20 18:09:28 +01:00