sliding-sync

mirror of https://github.com/matrix-org/sliding-sync.git synced 2025-03-10 13:37:11 +00:00

Author	SHA1	Message	Date
Kegan Dougal	05a82a43dc	Same race pattern as timeSince for timeSleep	2024-03-11 12:06:13 +00:00
David Robertson	c239cacc83	Initialise: handle gappy polls and ditch prependStateEvents	2023-11-03 15:42:25 +00:00
Kegan Dougal	32c2f6b93d	Actually use the provided value	2023-10-11 13:21:52 +01:00
Kegan Dougal	97d53448d7	Fix poller race condition	2023-10-11 12:58:05 +01:00
Kegan Dougal	0856a8d53d	bugfix: give up polling if the /sync response keeps erroring for >50min	2023-10-03 13:02:17 +01:00
David Robertson	a28e419d5d	Update mockClient to match new interface	2023-09-26 13:35:24 +01:00
David Robertson	e75a462d4c	Merge pull request #300 from matrix-org/dmr/invalidate-timelines	2023-09-20 14:29:55 +01:00
David Robertson	d3ba1f1c30	Move TimelineResponse back to sync2	2023-09-19 12:41:25 +01:00
David Robertson	957bdee9d2	Merge branch 'main' into dmr/invalidate-timelines	2023-09-19 12:40:13 +01:00
Kegan Dougal	e4cedaabcd	Merge branch 'main' into kegan/poll-retry-loop-bad-create-event	2023-09-14 09:29:44 +01:00
David Robertson	df01e50438	Pass TimelineResponse struct around	2023-09-13 19:17:53 +01:00
Quentin Gliech	af5e8579b2	Better propagate request context This properly propagates the go Context on down to all HTTP calls, which means that outgoing request have the OTLP trace context. This also adds the Jaeger propagator to the list of OTEL propagators, so that Synapse properly gets the incoming trace context. It also upgrades all the OTEL libraries	2023-09-13 19:41:52 +02:00
Kegan Dougal	7c80b5424a	Prioritise retriable errors over unretriable errors Bump to Go 1.20 for errors.Join and added introspection to errors.As to inspect []error.	2023-09-12 14:57:40 +01:00
David Robertson	d34a053927	Brief unit test	2023-09-06 15:49:19 +01:00
David Robertson	fca1318095	Let PollerMap.EnsurePolling return an error	2023-09-06 11:28:20 +01:00
Kegan Dougal	9c5ebb2f2b	Guard for when the test has finished	2023-08-16 15:08:50 +01:00
Kegan Dougal	980d6423a5	Fix concurrent map writes	2023-08-16 14:00:40 +01:00
David Robertson	ff7120245a	Merge pull request #242 from matrix-org/dmr/purge-inactive-pollers	2023-08-16 13:43:46 +01:00
Kegan Dougal	066327d407	Add internal.DataError to skip over bad responses - Move processing of to-device msgs to the last thing, so we don't double process. - Use internal.DataError when we fail to load a snapshot correctly i.e missing events in the snapshot.	2023-08-16 10:52:35 +01:00
Kegan Dougal	9c7c7b7be2	Unbreak UTs	2023-08-15 19:11:21 +01:00
Kegan Dougal	d63864f494	Modify V2DataReceiver to allow error returns On receipt of errors, do not advance the since token. Only added to functions where losing data is bad (events, to-device msgs, etc). With unit tests, which actually caught some interesting failure modes.	2023-08-15 18:51:11 +01:00
David Robertson	d659824edf	Expire pollers method	2023-08-09 11:46:12 +01:00
Till Faelligen	5846873d43	Merge branch 'main' of github.com:matrix-org/sliding-sync into s7evink/typing	2023-08-02 14:02:44 +02:00
Kegan Dougal	6623ddb9e3	Do not make snapshots for lone leave events Specifically this is targetting invite rejections, where the leave event is inside the leave block of the sync v2 response. Previously, we would make a snapshot with this leave event. If the proxy wasn't in this room, it would mean the room state would just be the leave event, which is wrong. If the proxy was in the room, then state would correctly be rolled forward.	2023-07-31 17:53:15 +01:00
Till Faelligen	3a2001f07d	Use PollerID instead of device ID	2023-07-27 12:33:10 +02:00
Till Faelligen	8dc8d4897f	Let only one device handle typing notifications	2023-07-24 08:40:23 +02:00
Till Faelligen	22f640a352	Check that calls to /sync use the expected since token	2023-07-19 14:56:44 +02:00
Till Faelligen	46d56b8433	Add test to check that the since token is only stored in the database periodically	2023-07-19 12:17:47 +02:00
Till Faelligen	f6f1106fc4	Update test to include ToDevice messages	2023-07-18 14:37:33 +02:00
David Robertson	e5eb4f12ba	Plumb a ctx through to sync2 Thank God for Goland's refactoring tools. This will (untested) associate sentry events from the sync2 part of the code with User IDs and Device IDs, without having to constantly invoke sentry.WithScope(). (Not all of the handler methods currently have that information.) It also leaves the door open for us to include more data on poller sentry reports (e.g. access token hash, time of last token activity on the sync3 side, ...)	2023-05-25 22:22:15 +01:00
David Robertson	b428ede1ca	Update txns table	2023-05-02 18:16:14 +01:00
David Robertson	c1b1de5456	Delete tokens on expiry, to force /whoami lookup	2023-04-28 18:50:43 +01:00
David Robertson	181cfba19e	Introduce PollerID	2023-04-28 17:05:46 +01:00
David Robertson	5621423295	Fix tests	2023-04-18 15:16:42 +01:00
David Robertson	846197e996	Have WhoAmI extract the device_id Useful for #51, small enough to include in isolation	2023-04-11 22:14:15 +01:00
Kegan Dougal	a6c3f8f3fc	When a device is deleted, remove all device data with it (to-device events, device lists)	2023-03-01 16:56:04 +00:00
Kegan Dougal	6bdef5feba	bugfix: expire connections when the access token gets invalidated With regression test. The behaviour is: - Delete the connection, such that incoming requests will end up with M_UNKNOWN_POS - The next request will then return HTTP 401. This has knock-on effects: - We no longer send HTTP 502 if /whoami returns 401, instead we return 401. - When the token is expired (pollers get 401, the device is deleted from the DB).	2023-03-01 16:40:15 +00:00
Kegan Dougal	48f28f9f6c	perf: filter out all rooms when doing an initial sync on 2nd+ pollers Fixes #17 in theory, as now the initial sync request will have no rooms and hence be faster to return. In theory. Maybe. Let's see.	2023-01-05 18:25:25 +00:00
Kegan Dougal	be8543a21a	add extensions for typing and receipts; bugfixes and additional perf improvements Features: - Add `typing` extension. - Add `receipts` extension. - Add comprehensive prometheus `/metrics` activated via `SYNCV3_PROM`. - Add `SYNCV3_PPROF` support. - Add `by_notification_level` sort order. - Add `include_old_rooms` support. - Add support for `$ME` and `$LAZY`. - Add correct filtering when `,` is used as `required_state`. - Add `num_live` to each room response to indicate how many timeline entries are live. Bug fixes: - Use a stricter comparison function on ranges: fixes an issue whereby UTs fail on go1.19 due to change in sorting algorithm. - Send back an `errcode` on HTTP errors (e.g expired sessions). - Remove `unsigned.txn_id` on insertion into the DB. Otherwise other users would see other users txn IDs :( - Improve range delta algorithm: previously it didn't handle cases like `[0,20] -> [20,30]` and would panic. - Send HTTP 400 for invalid range requests. - Don't publish no-op unread counts which just adds extra noise. - Fix leaking DB connections which could eventually consume all available connections. - Ensure we always unblock WaitUntilInitialSync even on invalid access tokens. Other code relies on WaitUntilInitialSync() actually returning at _some_ point e.g on startup we have N workers which bound the number of concurrent pollers made at any one time, we need to not just hog a worker forever. Improvements: - Greatly improve startup times of sync3 handlers by improving `JoinedRoomsTracker`: a modest amount of data would take ~28s to create the handler, now it takes 4s. - Massively improve initial initial v3 sync times, by refactoring `JoinedRoomsTracker`, from ~47s to <1s. - Add `SlidingSyncUntil...` in tests to reduce races. - Tweak the API shape of JoinedUsersForRoom to reduce state block processing time for large rooms from 63s to 39s. - Add trace task for initial syncs. - Include the proxy version in UA strings. - HTTP errors now wait 1s before returning to stop clients tight-looping on error. - Pending event buffer is now 2000. - Index the room ID first to cull the most events when returning timeline entries. Speeds up `SelectLatestEventsBetween` by a factor of 8. - Remove cancelled `m.room_key_requests` from the to-device inbox. Cuts down the amount of events in the inbox by ~94% for very large (20k+) inboxes, ~50% for moderate sized (200 events) inboxes. Adds book-keeping to remember the unacked to-device position for each client.	2022-12-14 18:53:55 +00:00
Kegan Dougal	d77e21138d	refactor: remove spurious code; rename OnRetireInvite to OnLeftRoom Add HasLeft to the user room metadata to control whether or not the list algo will nuke the room or not from the list.	2022-08-31 14:48:14 +01:00
Kegan Dougal	5dc1c38764	Add prev_batch column to events table This will be used to return prev batch tokens to the client on a best-effort basis.	2022-03-31 14:29:26 +01:00
Kegan Dougal	873edd7315	bugfix: rework how invites are handled Fixes https://github.com/matrix-org/sliding-sync/issues/23 - Added InvitesTable - Allow invites to be sorted/searched the same as any other room by implementing RoomMetadata for the invite (though this is best effort as we don't have heroes)	2022-03-29 09:44:18 +01:00
Kegan Dougal	2920191a44	feature: add txnids to events Clients rely on transaction IDs coming down their /sync streams so they can pair up an incoming event with an event they just sent but have not yet got the event ID for. The proxy has not historically handled this because of the shared work model of operation, where we store exactly 1 copy of the event in the database and no more. This means if Alice and Bob are running in the same proxy, then Alice sends a message, Bob's /sync stream may get the event first and that will NOT contain the `transaction_id`. This then gets written into the database. Later when Alice /syncs, she will not get the `transaction_id` for her event which she sent. This commit fixes this by having a TTL cache which maps (user, event) -> txn_id. Transaction IDs are inherently ephemeral, so keeping the last 5 minutes worth of txn IDs in-memory is an easy solution which will be good enough for the proxy. Actual server implementations of sliding sync will be able to trivially deal with this behaviour natively.	2022-03-28 15:19:42 +01:00
Kegan Dougal	3e36037844	bugfix: ensure we have done an initial sync before returning from EnsurePolling - Modify the API to instead have `WaitUntilInitialSync()` which is backed by a `WaitGroup`. - Call this new function when a poller exists and hasn't been terminated. Previously, we would assume that if a poller exists then it has done an initial sync, which may not always be true. This could lead to position mismatches as a connection would be re-created after EnsurePolling returned.	2022-03-18 12:31:31 +00:00
Kegan Dougal	24be8252f7	Change the retry schedule for the v2 poller to always be 3s Comments explain why.	2021-12-15 09:56:58 +00:00
Kegan Dougal	0e021eb560	Pass to-device messages through to the client - Treat to-device messages as opaque JSON blobs - Add basic integration test to ensure the messages make it from v2 to v3.	2021-12-14 11:51:47 +00:00
Kegan Dougal	a2d6774024	Support `filters.is_dm` - Add `AccountDataTable` with tests. - Read global and per-room account data from sync v2 and add new callbacks to the poller. - Update the `SyncV3Handler` to persist account data from sync v2 then notify the user cache. - Update the `UserCache` to update `UserRoomData.IsDM` status on `m.direct` events. - Read `m.direct` event from the DB when `UserCache` is created to track DM status per-room.	2021-11-09 15:08:08 +00:00
Kegan Dougal	6c12077f62	Ensure the first sync is snappy if there is no traffic	2021-10-29 13:15:39 +01:00
Kegan Dougal	9f3364d9ed	PollerMap: ensure callbacks are always called from a single goroutine Document a nasty race condition which can happen if >1 user is joined to the same room. Fixed to ensure that `GlobalCache` will always stay in-sync with the database without having to hit the database.	2021-10-28 16:15:17 +01:00
Kegan Dougal	fb9394d73b	Add UnreadTable to track per-user per-room unread counters With tests. Add function to V2DataReceiver interface.	2021-10-08 12:31:56 +01:00

1 2

59 Commits