sliding-sync

mirror of https://github.com/matrix-org/sliding-sync.git synced 2025-03-10 13:37:11 +00:00

Author	SHA1	Message	Date
Kegan Dougal	cc95ebe1e4	Review comments	2023-11-24 15:23:35 +00:00
Kegan Dougal	129dea816a	bugfix: fix 2 bugs with connection deletion code - Connections are unique for the 3-uple (user, device, conneciton) IDs. The code was only checking (user, device). This means we would delete ALL connections for a device, is ANY connection expired. - ...except we wouldn't, because of the 2nd bug, which is the deletion code itself. This is missing `i--` so we will not do an ID check on the element after a deleted index. Both of these issues have now been fixed.	2023-11-24 14:47:47 +00:00
Kegan Dougal	b1fd9a1218	Log errors when removing unknown keys	2023-11-24 14:16:13 +00:00
Kegan Dougal	eb351ff9b8	Add ConnMap UTs; tweak API shape to be a bit more sane	2023-11-24 13:18:24 +00:00
David Robertson	4011e3812a	Batch destroy conns	2023-11-07 18:33:08 +00:00
David Robertson	c6fb96ac70	Nuke connections after a room is invalidated	2023-11-03 15:46:58 +00:00
David Robertson	f3037861a7	Cancel outstanding requests when destroying conns	2023-10-26 15:58:06 +01:00
David Robertson	2cc58cf28a	Merge pull request #146 from matrix-org/dmr/prevent-local-echo-flicker	2023-07-31 16:16:36 +01:00
David Robertson	fe74488a58	Clear queues on receipt of txn payload	2023-07-26 13:47:33 +01:00
Kegan Dougal	bfb980bad8	bugfix: fix deadlock when connections expire due to full buffers Caused by the fix in https://github.com/matrix-org/sliding-sync/pull/220	2023-07-25 10:16:07 +01:00
Kegan Dougal	d9d0609a51	Protect map against concurrent map r/w Comments did say to hold mu..	2023-07-24 17:46:30 +01:00
Kegan Dougal	a90a9584c9	Comments	2023-07-24 16:22:22 +01:00
Kegan Dougal	f0ed4969a6	nil checks	2023-07-24 14:43:31 +01:00
Kegan Dougal	7dc999a44e	Add more metrics around connection buffers	2023-07-24 14:17:10 +01:00
Kegan Dougal	afaea53064	feat: add rate limiting The server will wait 1s if clients: - repeat the same request (same `?pos=`) - repeatedly hit `/sync` without a `?pos=`. Both of these failure modes have been seen in the wild. Fixes #93.	2023-05-22 17:44:04 +01:00
Kegan Dougal	1d48ebea2f	Add conn_id as per the MSC Also fix a bug whereby required_state would not cause new state to be sent to clients if it was updated as part of a room subscription.	2023-05-10 17:31:07 +01:00
David Robertson	ca8a2d72c4	Make ConnID hold a UserID	2023-04-28 18:50:42 +01:00
Kegan Dougal	6bdef5feba	bugfix: expire connections when the access token gets invalidated With regression test. The behaviour is: - Delete the connection, such that incoming requests will end up with M_UNKNOWN_POS - The next request will then return HTTP 401. This has knock-on effects: - We no longer send HTTP 502 if /whoami returns 401, instead we return 401. - When the token is expired (pollers get 401, the device is deleted from the DB).	2023-03-01 16:40:15 +00:00
Kegan Dougal	be8543a21a	add extensions for typing and receipts; bugfixes and additional perf improvements Features: - Add `typing` extension. - Add `receipts` extension. - Add comprehensive prometheus `/metrics` activated via `SYNCV3_PROM`. - Add `SYNCV3_PPROF` support. - Add `by_notification_level` sort order. - Add `include_old_rooms` support. - Add support for `$ME` and `$LAZY`. - Add correct filtering when `,` is used as `required_state`. - Add `num_live` to each room response to indicate how many timeline entries are live. Bug fixes: - Use a stricter comparison function on ranges: fixes an issue whereby UTs fail on go1.19 due to change in sorting algorithm. - Send back an `errcode` on HTTP errors (e.g expired sessions). - Remove `unsigned.txn_id` on insertion into the DB. Otherwise other users would see other users txn IDs :( - Improve range delta algorithm: previously it didn't handle cases like `[0,20] -> [20,30]` and would panic. - Send HTTP 400 for invalid range requests. - Don't publish no-op unread counts which just adds extra noise. - Fix leaking DB connections which could eventually consume all available connections. - Ensure we always unblock WaitUntilInitialSync even on invalid access tokens. Other code relies on WaitUntilInitialSync() actually returning at _some_ point e.g on startup we have N workers which bound the number of concurrent pollers made at any one time, we need to not just hog a worker forever. Improvements: - Greatly improve startup times of sync3 handlers by improving `JoinedRoomsTracker`: a modest amount of data would take ~28s to create the handler, now it takes 4s. - Massively improve initial initial v3 sync times, by refactoring `JoinedRoomsTracker`, from ~47s to <1s. - Add `SlidingSyncUntil...` in tests to reduce races. - Tweak the API shape of JoinedUsersForRoom to reduce state block processing time for large rooms from 63s to 39s. - Add trace task for initial syncs. - Include the proxy version in UA strings. - HTTP errors now wait 1s before returning to stop clients tight-looping on error. - Pending event buffer is now 2000. - Index the room ID first to cull the most events when returning timeline entries. Speeds up `SelectLatestEventsBetween` by a factor of 8. - Remove cancelled `m.room_key_requests` from the to-device inbox. Cuts down the amount of events in the inbox by ~94% for very large (20k+) inboxes, ~50% for moderate sized (200 events) inboxes. Adds book-keeping to remember the unacked to-device position for each client.	2022-12-14 18:53:55 +00:00
Kegan Dougal	fdd530350e	bugfix: remove the conn when the buffer is exceeded Previously, we would only remove the conn due to TTL expiry. If the buffer filled in the mean time, we risked returning no messages at all.	2022-08-19 18:12:09 +01:00
Kegan Dougal	f26bb9a0da	More logs	2022-04-12 12:35:22 +01:00
Kegan Dougal	70f54fe2d2	Better comments	2022-03-23 14:13:59 +00:00
Kegan Dougal	bb96521843	Remove debug logging	2022-02-18 17:01:16 +00:00
Kegan Dougal	b208a2e2b3	Add room name filtering; Remove session IDs entirely Should fix #19	2022-02-18 16:49:26 +00:00
Kegan Dougal	d12863b9fa	Remove connections when the buffer overflows Else we block for 5s for each event resulting in a backlog of events.	2021-12-01 12:22:56 +00:00
Kegan Dougal	11b1260d07	Split sync3 into sync3 and sync3/handler `sync3` contains data structures and logic which is very isolated and testable (think ConnMap, Room, Request, SortableRooms, etc) whereas `sync3/handler` contains control flow which calls into `sync3` data structures. This has numerous benefits: - Gnarly complicated structs like `ConnState` are now more isolated from the codebase, forcing better API design on `sync3` structs. - The inability to do import cycles forces structs in `sync3` to remain simple: they cannot pull in control flow logic from `sync3/handler` without causing a compile error. - It's significantly easier to figure out where to start looking for code that executes when a new request is received, for new developers. - It simplifies the number of things that `ConnState` can touch. Previously we were gut wrenching out of convenience but now we're forced to move more logic from `ConnState` into `sync3` (depending on the API design). For example, adding `SortableRooms.RoomIDs()`.	2021-11-05 15:45:04 +00:00
Kegan Dougal	488c638e7b	Streamline how new events are pushed to ConnState Let ConnState directly subscribe to GlobalCache rather than the awful indirection of ConnMap -> Conn -> ConnState we had before. We had that before because ConnMap is responsible for destroying old connections (based on the TTL cache), so we could just subscribe once and then look through the map to see who to notify. In the interests of decoupling logic, we now just call ConnState.Destroy() when the connection is removed from ConnMap which allows ConnState to subscribe to GlobalCache on creation and remove its subscription on Destroy(). This makes it significantly clearer who and where callbacks are firing from and to, and now means ConnMap is simply in charge of maintaining maps of user IDs -> Conn as well as terminating them when they expire via TTL.	2021-10-22 17:21:47 +01:00
Kegan Dougal	5cb9f707a7	Move JoinedRoomsTracker to GlobalCache	2021-10-11 18:46:17 +01:00
Kegan Dougal	c932365dc4	Rely on ConnMap to notify ConnState based on the JoinedRoomsTracker	2021-10-11 18:21:47 +01:00
Kegan Dougal	52da56c70d	Remove ConnStateStore and use Global/UserCache exclusively Add a `LoadJoinedRoomsOverride` to allow tests to override and bypass DB checks. We need them in the cache in order to synchronise loading connection state with live updates to ensure we process events exactly once.	2021-10-11 18:09:29 +01:00
Kegan Dougal	ed7433691c	Move LoadState out of ConnMap and into GlobalCache	2021-10-11 17:24:46 +01:00
Kegan Dougal	15214881fc	Add GlobalCache and move the global room map to it This is currently the worst of all worlds as global caches are now spread between this and the connmap.	2021-10-11 17:12:54 +01:00
Kegan Dougal	48613956d1	Add UserCache and move unread count tracking to it Keep it pure (not dependent on `state.Storage`) to make testing easier. The responsibility for fanning out user cache updates is with the Handler as it generally deals with glue code.	2021-10-11 16:22:41 +01:00
Kegan Dougal	5f19fccd07	Implement unread counts on the client	2021-10-08 14:15:36 +01:00
Kegan Dougal	4b377b3b6d	Read unread counts on startup; cache counts when live streaming With a few more tests	2021-10-08 13:31:30 +01:00
Kegan Dougal	7334716646	Implement LoadState with tests	2021-10-07 15:58:14 +01:00
Kegan Dougal	e78f89fb24	Add ability to filter on event type for RoomStateAfterEventPosition Adding this filter fundamentally changes the query to be optimised to not pull out the entire room state. This will be used when calculating the `required_state` response. Also add tests for RoomStateAfterEventPosition and RoomStateBeforeEventPosition	2021-10-07 13:59:53 +01:00
Kegan Dougal	e20a8ad067	Move synclive to sync3	2021-10-05 16:22:02 +01:00

38 Commits