sliding-sync

mirror of https://github.com/matrix-org/sliding-sync.git synced 2025-03-10 13:37:11 +00:00

Author	SHA1	Message	Date
Kegan Dougal	4d54faa1a6	Fix remaining race conditions; add -race to CI	2024-03-11 10:30:03 +00:00
David Robertson	c239cacc83	Initialise: handle gappy polls and ditch prependStateEvents	2023-11-03 15:42:25 +00:00
David Robertson	f595aed2c5	Add a separate payload for redacting state So that we don't end up nuking conns unnecessarily.	2023-11-01 19:03:17 +00:00
David Robertson	41ed56ecd7	Tweak test prose	2023-09-19 15:37:41 +01:00
David Robertson	a4c90fbd78	Fixup merges	2023-09-19 12:55:14 +01:00
David Robertson	d3ba1f1c30	Move TimelineResponse back to sync2	2023-09-19 12:41:25 +01:00
David Robertson	957bdee9d2	Merge branch 'main' into dmr/invalidate-timelines	2023-09-19 12:40:13 +01:00
David Robertson	a65a69b7bc	Set missing_parents field in the DB	2023-09-13 19:17:53 +01:00
David Robertson	df01e50438	Pass TimelineResponse struct around	2023-09-13 19:17:53 +01:00
David Robertson	e83a9d6218	Unit test the cache reload emission logic	2023-09-12 19:04:16 +01:00
David Robertson	777cb357fe	Factor out AccumulateResult struct	2023-09-07 20:41:11 +01:00
David Robertson	e960d7ff80	Fix integration test to include a create event	2023-08-22 16:12:34 +01:00
David Robertson	be4b2dc9a1	Initialise: don't snapshot without create event	2023-08-22 15:56:38 +01:00
Kegan Dougal	6623ddb9e3	Do not make snapshots for lone leave events Specifically this is targetting invite rejections, where the leave event is inside the leave block of the sync v2 response. Previously, we would make a snapshot with this leave event. If the proxy wasn't in this room, it would mean the room state would just be the leave event, which is wrong. If the proxy was in the room, then state would correctly be rolled forward.	2023-07-31 17:53:15 +01:00
Kegan Dougal	ae29d14c6f	Remove unused code	2023-07-19 15:56:43 +01:00
Kegan Dougal	e947612ad9	Fix #192 : ignore unseen old events	2023-07-11 19:08:32 +01:00
Kegan Dougal	e753f51d24	Add concurrency test	2023-06-08 14:06:41 +01:00
Kegan Dougal	c2f4b53fdd	Use a txn for accumulator.Accumulate and make one in Storage.Accumulate	2023-06-08 13:54:46 +01:00
David Robertson	33b174dd67	Fixup test code	2023-04-17 20:26:40 +01:00
Kegan Dougal	a7eed93722	Add comprehensive regression test for GlobalSnapshot(); ensure we clear db conns when tests end	2023-01-18 14:54:26 +00:00
Kegan Dougal	00e4b8238c	BREAKING(db) perf: Massively improve time to exec RoomStateAfterEventPosition The previous query would: - Map room IDs to snapshot NIDs - UNNEST(events) on all those state snapshots - Compare if the type/state_key match the filter This was very slow under the following circumstances: - The rooms have lots of members (e.g Matrix HQ) - The required_state has no filter on m.room.member This is what Element X does. To improve this, we now have _two_ columns per state snapshot: - membership_events : only the m.room.member events - events : everything else Now if a query comes in which doesn't need m.room.member events, we just need to look in the everything-else bucket of events which is significantly smaller. This reduces these queries to about 50ms, from 500ms.	2023-01-12 17:11:09 +00:00
Kegan Dougal	aa28df161c	Rename package -> github.com/matrix-org/sliding-sync	2022-12-15 11:08:50 +00:00
Kegan Dougal	be8543a21a	add extensions for typing and receipts; bugfixes and additional perf improvements Features: - Add `typing` extension. - Add `receipts` extension. - Add comprehensive prometheus `/metrics` activated via `SYNCV3_PROM`. - Add `SYNCV3_PPROF` support. - Add `by_notification_level` sort order. - Add `include_old_rooms` support. - Add support for `$ME` and `$LAZY`. - Add correct filtering when `,` is used as `required_state`. - Add `num_live` to each room response to indicate how many timeline entries are live. Bug fixes: - Use a stricter comparison function on ranges: fixes an issue whereby UTs fail on go1.19 due to change in sorting algorithm. - Send back an `errcode` on HTTP errors (e.g expired sessions). - Remove `unsigned.txn_id` on insertion into the DB. Otherwise other users would see other users txn IDs :( - Improve range delta algorithm: previously it didn't handle cases like `[0,20] -> [20,30]` and would panic. - Send HTTP 400 for invalid range requests. - Don't publish no-op unread counts which just adds extra noise. - Fix leaking DB connections which could eventually consume all available connections. - Ensure we always unblock WaitUntilInitialSync even on invalid access tokens. Other code relies on WaitUntilInitialSync() actually returning at _some_ point e.g on startup we have N workers which bound the number of concurrent pollers made at any one time, we need to not just hog a worker forever. Improvements: - Greatly improve startup times of sync3 handlers by improving `JoinedRoomsTracker`: a modest amount of data would take ~28s to create the handler, now it takes 4s. - Massively improve initial initial v3 sync times, by refactoring `JoinedRoomsTracker`, from ~47s to <1s. - Add `SlidingSyncUntil...` in tests to reduce races. - Tweak the API shape of JoinedUsersForRoom to reduce state block processing time for large rooms from 63s to 39s. - Add trace task for initial syncs. - Include the proxy version in UA strings. - HTTP errors now wait 1s before returning to stop clients tight-looping on error. - Pending event buffer is now 2000. - Index the room ID first to cull the most events when returning timeline entries. Speeds up `SelectLatestEventsBetween` by a factor of 8. - Remove cancelled `m.room_key_requests` from the to-device inbox. Cuts down the amount of events in the inbox by ~94% for very large (20k+) inboxes, ~50% for moderate sized (200 events) inboxes. Adds book-keeping to remember the unacked to-device position for each client.	2022-12-14 18:53:55 +00:00
Kegan Dougal	1380a71f80	bugfix: fix several issues which could cause corrupt state snapshots A fundamental assumption in the proxy has been that the order of events in `timeline` in v2 will be the same all the time. There's some evidence to suggest this isn't true in the wild. This commit refactors the proxy to not assume this. It does this by: - Not relying on the number of newly inserted rows and slicing the events to figure out _which_ events are new. Now the INSERT has `RETURNING event_id, event_nid` and we return a map from event ID to event NID to explicitly say which events are new. - Add more paranoia when calculating new state snapshots: if we see the same (type, state key) tuple more than once in a snapshot we error out. - Add regression tests which try to insert events out of order to trip the proxy up.	2022-06-08 18:20:10 +01:00
Kegan Dougal	5dc1c38764	Add prev_batch column to events table This will be used to return prev batch tokens to the client on a best-effort basis.	2022-03-31 14:29:26 +01:00
Kegan Dougal	e04b38726a	Fix a bug which can happen when v2 sync returns dupe events Add regression tests as well.	2021-10-01 16:55:50 +01:00
Kegan Dougal	e838fab449	Load initial ConnState; document and fix races with loading/streaming Specifically, we can double-process events if we don't take into account the event NID. This happens because we can receive live events before we have loaded the initial connection state (list of joined rooms). We base the initial load around an event NID, so we need to make sure to ignore any live streamed events which are <= the initial load NID. We could have alternatively loaded the initial connection state and /then/ register to receive live events, but this means we could drop events in the gap between loading events and making the register call which is arguably worse. We could slap a mutex around it all to atomically do this, but this means that getting pushed new events is tied to loading (potentially a lot of) state for a single Conn, increasing lock contention.	2021-09-23 17:46:34 +01:00
Kegan Dougal	66c1a8a3e1	Remove membership_log_table and fold behaviour into events_table Add another index to make queries asking for "all $type events in room X" fast	2021-08-23 15:21:51 +01:00
Kegan Dougal	65cbdb07c8	Add type\|state_key cols to events table; refactor select in - Add `verifyAll` flag to assert if all events should be in the SELECT result - Factor out `testutils.NewStateEvent`	2021-08-20 15:56:17 +01:00
Kegan Dougal	45e9e432bc	Track the state before on each event rather than after It's easier to roll forward than roll backwards. Add 'replaces_nid' field on the events table which tells which nid in the snapshot gets replaced, if any.	2021-08-18 18:21:40 +01:00
Kegan Dougal	30366983ab	Remove snapshot_ref table; it's easier to not track this for now We still track tokens though so can retrospectively clean up snapshots	2021-08-18 15:50:51 +01:00
Kegan Dougal	9fdf3901df	Hook up the notifier and test it Test that v3 requests can time out, be notified when v2 returns a response, can return immediately.	2021-07-23 16:40:32 +01:00
Kegan Dougal	90dceee6f6	Add V2DataReceiver interface and indirect via SyncV3Handler This allows us to notify the Notifier as well as store the data in the database. Reshuffle where streams live (it's a sync v3 concept).	2021-07-23 15:39:41 +01:00
Kegan Dougal	e09c749bdc	Add Storage and put Accumulator inside it	2021-06-16 17:18:04 +01:00
Kegan Dougal	fc35269134	Fix test data	2021-06-15 17:43:13 +01:00
Kegan Dougal	9588597a67	Modify snapshot ref table to track entities	2021-06-15 17:26:17 +01:00
Kegan Dougal	1b28768ca2	Fix tests	2021-06-11 14:47:39 +01:00
Kegan Dougal	5fa1548a27	Add membership log accumulator tests	2021-06-11 14:07:25 +01:00
Kegan Dougal	502274b475	Dump v2 responses into the accumulator	2021-06-03 16:18:01 +01:00
Kegan Dougal	0dcd3fac09	Make tests work for others, add timeline calculations	2021-06-03 14:35:34 +01:00
Kegan Dougal	5909d1a6b0	Implement Accumulate	2021-05-28 16:07:28 +01:00
Kegan Dougal	91dd5609f7	Impl most of Accumulator.Accumulate	2021-05-28 12:08:10 +01:00
Kegan Dougal	cd20d07d9f	Add Accumulator.Initialise with tests	2021-05-27 19:20:36 +01:00

43 Commits