sliding-sync

mirror of https://github.com/matrix-org/sliding-sync.git synced 2025-03-10 13:37:11 +00:00

Author	SHA1	Message	Date
Kegan Dougal	5dae70069b	Clean the syncv3_snapshots table periodically Also cleans the transaction table periodically. Fixes https://github.com/matrix-org/sliding-sync/issues/372 On testing, this cuts db size to about 1/3 of its original size.	2024-04-22 08:55:05 +01:00
Kegan Dougal	7e06813fe2	Fix #365 : only return the last joined range If we returned multiple distinct ranges, we always assumed that the history visibility was "joined", so we would never return events in the invite/shared state. This would be fine if the client had a way to fetch those events sent before they were joined, but they did not have a way as the prev_batch token would not be set correctly. We now only return a single range of events and the prev batch for /that/ range only, and defer to the upstream HS for history visibility calculations. Add end-to-end test to assert this new behaviour works.	2023-11-06 11:54:15 +00:00
David Robertson	f0ea7cbd4d	Add FetchMemberships function Pulled out of #329.	2023-11-02 15:47:17 +00:00
David Robertson	d3ba1f1c30	Move TimelineResponse back to sync2	2023-09-19 12:41:25 +01:00
David Robertson	957bdee9d2	Merge branch 'main' into dmr/invalidate-timelines	2023-09-19 12:40:13 +01:00
David Robertson	3150c17cde	Test helper driver-by comment	2023-09-13 19:17:53 +01:00
David Robertson	df01e50438	Pass TimelineResponse struct around	2023-09-13 19:17:53 +01:00
David Robertson	773a28cf14	Make circularSlice generic	2023-09-08 18:17:13 +01:00
David Robertson	777cb357fe	Factor out AccumulateResult struct	2023-09-07 20:41:11 +01:00
Kegan Dougal	b2c26b7e93	Redact events in the DB on m.room.redaction Fixes #279	2023-08-31 17:06:44 +01:00
Kegan Dougal	6623ddb9e3	Do not make snapshots for lone leave events Specifically this is targetting invite rejections, where the leave event is inside the leave block of the sync v2 response. Previously, we would make a snapshot with this leave event. If the proxy wasn't in this room, it would mean the room state would just be the leave event, which is wrong. If the proxy was in the room, then state would correctly be rolled forward.	2023-07-31 17:53:15 +01:00
Kegan Dougal	019661eb76	Calculate heroes from the returned joined/invited members	2023-07-19 18:23:09 +01:00
Kegan Dougal	1895080e84	Remove unused functions	2023-07-17 17:47:37 +01:00
Kegan Dougal	9ebe7634ec	Implement table tests	2023-07-17 16:25:28 +01:00
Kegan Dougal	fbd865abba	wip tests	2023-07-17 15:55:10 +01:00
Kegan Dougal	fc04171c7c	Combine invite/join calcs into 1 query for speed	2023-07-17 10:48:21 +01:00
David Robertson	1717408dc3	Use fewer DB conns when events into the UserCache	2023-06-19 17:58:56 +01:00
David Robertson	5636f11984	Bugger, I need join event timestamps too	2023-06-06 14:22:51 +01:00
David Robertson	3e4eaa0219	Fix tests this time?	2023-06-05 18:10:35 +01:00
David Robertson	c07ef096bc	Cleanup tests again?	2023-06-05 14:49:50 +01:00
David Robertson	dcc37926e3	Fixup tests	2023-06-05 14:03:30 +01:00
David Robertson	6574101a7b	GlobalCache: LoadJoinedRooms also loads join NIDs	2023-06-01 20:05:42 +01:00
Kegan Dougal	fa6746796c	perf: improve startup speeds by using temp tables When the proxy is run with large DBs (10m+ events), the startup queries are very slow (around 30min to load the initial snapshot. After much EXPLAIN ANALYZEing, the cause is due to Postgres' query planner not making good decisions when the the tables are that large. Specifically, the startup queries need to pull all joined members in all rooms, which ends up being nearly 50% of the entire events table of 10m rows. When this query is embedded in a subselect, the query planner assumes that the subselect will return only a few rows, and decides to pull those rows via an index. In this particular case, indexes are the wrong choice, as there are SO MANY rows a Seq Scan is often more appropriate. By using an index (which is a btree), this ends up doing log(n) operations _per row_ or `O(0.5 * n * log(n))` assuming we pull 50% of the table of n rows. As n increases, this is increasingly the wrong call over a basic O(n) seq scan. When n=10m, a seq scan has a cost of 10m, but using indexes has a cost of 16.6m. By dumping the result of the subselect to a temporary table, this allows the query planner to notice that using an index is the wrong thing to do, resulting in better performance. On large DBs, this decreases the startup time from 30m to ~5m.	2023-05-18 16:45:02 +01:00
Kegan Dougal	513aec4c61	Unbreak tests	2023-05-12 10:11:25 +01:00
David Robertson	666823d211	Introduce return struct for Initialise	2023-04-17 20:05:32 +01:00
Kegan Dougal	a7eed93722	Add comprehensive regression test for GlobalSnapshot(); ensure we clear db conns when tests end	2023-01-18 14:54:26 +00:00
Kegan Dougal	00e4b8238c	BREAKING(db) perf: Massively improve time to exec RoomStateAfterEventPosition The previous query would: - Map room IDs to snapshot NIDs - UNNEST(events) on all those state snapshots - Compare if the type/state_key match the filter This was very slow under the following circumstances: - The rooms have lots of members (e.g Matrix HQ) - The required_state has no filter on m.room.member This is what Element X does. To improve this, we now have _two_ columns per state snapshot: - membership_events : only the m.room.member events - events : everything else Now if a query comes in which doesn't need m.room.member events, we just need to look in the everything-else bucket of events which is significantly smaller. This reduces these queries to about 50ms, from 500ms.	2023-01-12 17:11:09 +00:00
Kegan Dougal	b5661a3c16	perf: rewrite MetadataForAllRooms to not do redundant work We already extracted the joined users in all rooms, but then this function would do another query to pull out the join counts. This query was particularly inefficient, clocking in at 4s (!) on my test server. Removed it entirely and instead do len(joinedUsers) by calling AllJoinedMembers first.	2023-01-03 14:43:31 +00:00
Kegan Dougal	aa28df161c	Rename package -> github.com/matrix-org/sliding-sync	2022-12-15 11:08:50 +00:00
Kegan Dougal	be8543a21a	add extensions for typing and receipts; bugfixes and additional perf improvements Features: - Add `typing` extension. - Add `receipts` extension. - Add comprehensive prometheus `/metrics` activated via `SYNCV3_PROM`. - Add `SYNCV3_PPROF` support. - Add `by_notification_level` sort order. - Add `include_old_rooms` support. - Add support for `$ME` and `$LAZY`. - Add correct filtering when `,` is used as `required_state`. - Add `num_live` to each room response to indicate how many timeline entries are live. Bug fixes: - Use a stricter comparison function on ranges: fixes an issue whereby UTs fail on go1.19 due to change in sorting algorithm. - Send back an `errcode` on HTTP errors (e.g expired sessions). - Remove `unsigned.txn_id` on insertion into the DB. Otherwise other users would see other users txn IDs :( - Improve range delta algorithm: previously it didn't handle cases like `[0,20] -> [20,30]` and would panic. - Send HTTP 400 for invalid range requests. - Don't publish no-op unread counts which just adds extra noise. - Fix leaking DB connections which could eventually consume all available connections. - Ensure we always unblock WaitUntilInitialSync even on invalid access tokens. Other code relies on WaitUntilInitialSync() actually returning at _some_ point e.g on startup we have N workers which bound the number of concurrent pollers made at any one time, we need to not just hog a worker forever. Improvements: - Greatly improve startup times of sync3 handlers by improving `JoinedRoomsTracker`: a modest amount of data would take ~28s to create the handler, now it takes 4s. - Massively improve initial initial v3 sync times, by refactoring `JoinedRoomsTracker`, from ~47s to <1s. - Add `SlidingSyncUntil...` in tests to reduce races. - Tweak the API shape of JoinedUsersForRoom to reduce state block processing time for large rooms from 63s to 39s. - Add trace task for initial syncs. - Include the proxy version in UA strings. - HTTP errors now wait 1s before returning to stop clients tight-looping on error. - Pending event buffer is now 2000. - Index the room ID first to cull the most events when returning timeline entries. Speeds up `SelectLatestEventsBetween` by a factor of 8. - Remove cancelled `m.room_key_requests` from the to-device inbox. Cuts down the amount of events in the inbox by ~94% for very large (20k+) inboxes, ~50% for moderate sized (200 events) inboxes. Adds book-keeping to remember the unacked to-device position for each client.	2022-12-14 18:53:55 +00:00
Kegan Dougal	5ca156afe9	spaces: synchronise space updates between global/user caches Add request filter for spaces(!)	2022-07-29 15:19:20 +01:00
Kegan Dougal	1a55076478	Add NewJoinEvent shorthand for tests	2022-07-12 15:12:02 +01:00
Kegan Dougal	1380a71f80	bugfix: fix several issues which could cause corrupt state snapshots A fundamental assumption in the proxy has been that the order of events in `timeline` in v2 will be the same all the time. There's some evidence to suggest this isn't true in the wild. This commit refactors the proxy to not assume this. It does this by: - Not relying on the number of newly inserted rows and slicing the events to figure out _which_ events are new. Now the INSERT has `RETURNING event_id, event_nid` and we return a map from event ID to event NID to explicitly say which events are new. - Add more paranoia when calculating new state snapshots: if we see the same (type, state key) tuple more than once in a snapshot we error out. - Add regression tests which try to insert events out of order to trip the proxy up.	2022-06-08 18:20:10 +01:00
Kegan Dougal	5339dc8ce3	perf: cache the prev batch tokens for each room with an LRU cache - Replace `PrevBatch string` in user room data with `PrevBatches lru.Cache`. This allows us to persist prev batch tokens in-memory rather than doing N sequential DB lookups which would take ~4s for ~150 rooms on the postgres instance running the database. The tokens are keyed off a tuple of the event ID being searched and the latest event in the room, to allow prev batches to be assigned when new sync v2 responses arrive. - Thread through context to complex storage functions for profiling	2022-04-26 14:42:30 +01:00
Kegan Dougal	17cc4e6ec1	perf: reduce the number of SQL queries further when pulling required_state	2022-04-25 20:35:27 +01:00
Kegan Dougal	0d8e22fc88	perf: refactor how required_state is queried from the database Use a single SQL query per request rather than sequentially performing 1 query per room.	2022-04-25 17:12:00 +01:00
Kegan Dougal	234d068d97	optimisation: only extract needed events for required_state where possible Previously, we would only optimise pulling out event types i.e. if you want state events with types A and B we only pull out all current state with event type A or B. This falls down when the client wants their own member event, as m.room.member is the bulk of the current state. This commit optimises the SQL queries to also take into account the state key asked for, whilst still supporting wildcards '*' when they are requested.	2022-04-22 12:12:51 +01:00
Kegan Dougal	dd6e6da50c	Inject prev_batch values into timeline UserRoomData	2022-03-31 15:10:42 +01:00
Kegan Dougal	5dc1c38764	Add prev_batch column to events table This will be used to return prev batch tokens to the client on a best-effort basis.	2022-03-31 14:29:26 +01:00
Kegan Dougal	873edd7315	bugfix: rework how invites are handled Fixes https://github.com/matrix-org/sliding-sync/issues/23 - Added InvitesTable - Allow invites to be sorted/searched the same as any other room by implementing RoomMetadata for the invite (though this is best effort as we don't have heroes)	2022-03-29 09:44:18 +01:00
Kegan Dougal	c15c3f290e	Integration tests for transaction IDs Also standardise testutils.NewEvent to match testutils.NewStateEvent to allow With... modifiers.	2022-03-28 15:52:25 +01:00
Kegan Dougal	53480c18a7	Revert "Load invite rooms on initial connection correctly" This reverts commit 991b597e6e4b167f1d67fcb2ea696204aabca8f2.	2022-03-25 14:34:23 +00:00
Kegan Dougal	991b597e6e	Load invite rooms on initial connection correctly	2022-03-25 13:44:20 +00:00
Kegan Dougal	e680a3c66d	Include invited rooms in the room list With a very basic test to make sure it appears.	2022-02-21 20:31:54 +00:00
Kegan Dougal	3f4a7459b4	Make more store functions private	2021-10-27 18:28:53 +01:00
Kegan Dougal	26ed9b9a40	Merge SortableRoom and HeroInfo into RoomMetadata RoomMetadata stores the current invite/join count, heroes for the room, most recent timestamp, name event content, canonical alias, etc This information is consistent across all users so can be globally cached for future use. Make ConnState call CalculateRoomName with RoomMetadata to run the name algorithm. This is almost complete but as there are no Heroes yet in the metadata, things don't quite render correctly yet.	2021-10-27 18:16:43 +01:00
Kegan Dougal	e9d179fe4a	tests: remove check for absolute room counts as it varies on the tests run	2021-10-27 11:02:35 +01:00
Kegan Dougal	51e6ac5469	HeroInfoForAllRooms: add queries for join/invite counts	2021-10-27 11:01:28 +01:00
Kegan Dougal	eaea3402a2	Use gmsl.Timestamp in more places	2021-10-26 10:01:45 +01:00
Kegan Dougal	d7913c8e26	Return the most recent timeline events for each room TODO: the global cache isn't being kept updated so live streamed events don't load (though they sort correctly)	2021-10-22 18:18:02 +01:00

1 2

59 Commits