81 Commits

Author SHA1 Message Date
Till Faelligen
ce15a2800c
Remove materialized view and use a recursive CTE instead to get unique
event_types
2024-05-21 16:16:17 +02:00
Till Faelligen
9eee30b152
Add a materialized view for event_types and use it in an updated query 2024-05-21 15:27:15 +02:00
Till Faelligen
425ed4efba
Fix order of params and tell pq what $1 is 2024-05-17 11:04:59 +02:00
Till Faelligen
1ae5b5cb3d
Also optimize LatestEventNIDInRooms 2024-05-17 10:41:28 +02:00
Till Faelligen
3ca77f2bc0
Optimize getting the latest events in each room
Signed-off-by: Till Faelligen <2353100+S7evinK@users.noreply.github.com>
2024-05-17 10:27:12 +02:00
David Robertson
ed7f052682
A bunch of comments from review 2023-09-20 14:01:15 +01:00
David Robertson
3e02510fa0
Undo GoLand's SQL string formatting
I'm sympathetic, but let's not stress about it now.
2023-09-20 13:36:54 +01:00
David Robertson
e06a5437a7
Ditch COMMENT ON 2023-09-20 12:45:41 +01:00
David Robertson
4aa0667afb
Review comment 2023-09-19 15:48:42 +01:00
David Robertson
fa227b79d3
Stop loading timelines if you hit missing_previous 2023-09-13 19:17:53 +01:00
David Robertson
a65a69b7bc
Set missing_parents field in the DB 2023-09-13 19:17:53 +01:00
David Robertson
df01e50438
Pass TimelineResponse struct around 2023-09-13 19:17:53 +01:00
David Robertson
16796db033
Add syncv3_events.missing_previous 2023-09-13 19:17:53 +01:00
David Robertson
6f3f556842
Revert "Add syncv3_events.missing_previous"
This reverts commit b8841008142c94498855a35a5ab477904f442b73.
2023-09-13 12:18:47 +01:00
David Robertson
b884100814
Add syncv3_events.missing_previous 2023-09-13 12:18:31 +01:00
Kegan Dougal
9209c61691 bugfix: set unsigned.redacted_because field on redaction
Element X relies on this field being set.
2023-09-07 10:16:00 +01:00
Kegan Dougal
001e7600d8 Review comments 2023-09-06 09:52:52 +01:00
Kegan Dougal
b2c26b7e93 Redact events in the DB on m.room.redaction
Fixes #279
2023-08-31 17:06:44 +01:00
Kegan Dougal
066327d407 Add internal.DataError to skip over bad responses
- Move processing of to-device msgs to the last thing, so we don't double process.
- Use internal.DataError when we fail to load a snapshot correctly i.e missing events in the snapshot.
2023-08-16 10:52:35 +01:00
Kegan Dougal
86d156c511 Actually use the modified Event in result 2023-07-26 17:12:42 +01:00
Kegan Dougal
0fea507b65 Add malformed tests for room state 2023-07-25 14:41:23 +01:00
Kegan Dougal
d745c90d95 Ignore malformed events
But handle unusual events. With regression test.

Fixes https://github.com/matrix-org/sliding-sync/issues/223
2023-07-25 14:25:30 +01:00
Kegan Dougal
ae29d14c6f Remove unused code 2023-07-19 15:56:43 +01:00
Kegan Dougal
baa5d05d31 Use the rooms table initially when querying latest nids 2023-07-13 18:19:00 +01:00
David Robertson
1717408dc3
Use fewer DB conns when events into the UserCache 2023-06-19 17:58:56 +01:00
Kegan Dougal
efaf2648eb Merge branch 'main' into kegan/accurate-load-positions 2023-06-13 08:44:21 +01:00
David Robertson
3965001abe
Use int64 for event NIDs 2023-06-10 14:13:16 +01:00
David Robertson
ecd4df4f1a
Comments! 2023-06-10 14:11:20 +01:00
Kegan Dougal
600c58acf3 Load loadPositions on conn startup 2023-06-09 17:28:01 +01:00
David Robertson
da118624d9
GlobalCache: load LatestEventsByType on startup 2023-06-01 20:05:42 +01:00
David Robertson
275538487b
Revert "Pass room ID to new query"
This reverts commit 6c9aa094ab10f4ddb066562841bf0f1f80773416.
2023-04-24 12:45:11 +01:00
David Robertson
6c9aa094ab
Pass room ID to new query 2023-04-24 12:04:23 +01:00
David Robertson
8324f6e654
SelectByIDs has an ordering guarantee 2023-04-22 00:36:39 +01:00
David Robertson
e6aac43c06
SelectUnknownEventIDs expects a txn 2023-04-19 13:05:16 +01:00
David Robertson
f28f7d0599
Return set of unknown events from the db 2023-04-18 14:55:58 +01:00
David Robertson
5d8560cd7c
Fix db query 2023-04-17 21:11:51 +01:00
David Robertson
4ba80d2b83
Initialise: handle state blocks from a gappy sync 2023-04-17 20:21:08 +01:00
Kegan Dougal
00e4b8238c BREAKING(db) perf: Massively improve time to exec RoomStateAfterEventPosition
The previous query would:
 - Map room IDs to snapshot NIDs
 - UNNEST(events) on all those state snapshots
 - Compare if the type/state_key match the filter

This was very slow under the following circumstances:
 - The rooms have lots of members (e.g Matrix HQ)
 - The required_state has no filter on m.room.member

This is what Element X does.

To improve this, we now have _two_ columns per state snapshot:
 - membership_events : only the m.room.member events
 - events : everything else

Now if a query comes in which doesn't need m.room.member events, we just need
to look in the everything-else bucket of events which is significantly smaller.
This reduces these queries to about 50ms, from 500ms.
2023-01-12 17:11:09 +00:00
Kegan Dougal
aa28df161c Rename package -> github.com/matrix-org/sliding-sync 2022-12-15 11:08:50 +00:00
Kegan Dougal
be8543a21a add extensions for typing and receipts; bugfixes and additional perf improvements
Features:
 - Add `typing` extension.
 - Add `receipts` extension.
 - Add comprehensive prometheus `/metrics` activated via `SYNCV3_PROM`.
 - Add `SYNCV3_PPROF` support.
 - Add `by_notification_level` sort order.
 - Add `include_old_rooms` support.
 - Add support for `$ME` and `$LAZY`.
 - Add correct filtering when `*,*` is used as `required_state`.
 - Add `num_live` to each room response to indicate how many timeline entries are live.

Bug fixes:
 - Use a stricter comparison function on ranges: fixes an issue whereby UTs fail on go1.19 due to change in sorting algorithm.
 - Send back an `errcode` on HTTP errors (e.g expired sessions).
 - Remove `unsigned.txn_id` on insertion into the DB. Otherwise other users would see other users txn IDs :(
 - Improve range delta algorithm: previously it didn't handle cases like `[0,20] -> [20,30]` and would panic.
 - Send HTTP 400 for invalid range requests.
 - Don't publish no-op unread counts which just adds extra noise.
 - Fix leaking DB connections which could eventually consume all available connections.
 - Ensure we always unblock WaitUntilInitialSync even on invalid access tokens. Other code relies on WaitUntilInitialSync() actually returning at _some_ point e.g on startup we have N workers which bound the number of concurrent pollers made at any one time, we need to not just hog a worker forever.

Improvements:
 - Greatly improve startup times of sync3 handlers by improving `JoinedRoomsTracker`: a modest amount of data would take ~28s to create the handler, now it takes 4s.
 - Massively improve initial initial v3 sync times, by refactoring `JoinedRoomsTracker`, from ~47s to <1s.
 - Add `SlidingSyncUntil...` in tests to reduce races.
 - Tweak the API shape of JoinedUsersForRoom to reduce state block processing time for large rooms from 63s to 39s.
 - Add trace task for initial syncs.
 - Include the proxy version in UA strings.
 - HTTP errors now wait 1s before returning to stop clients tight-looping on error.
 - Pending event buffer is now 2000.
 - Index the room ID first to cull the most events when returning timeline entries. Speeds up `SelectLatestEventsBetween` by a factor of 8.
 - Remove cancelled `m.room_key_requests` from the to-device inbox. Cuts down the amount of events in the inbox by ~94% for very large (20k+) inboxes, ~50% for moderate sized (200 events) inboxes. Adds book-keeping to remember the unacked to-device position for each client.
2022-12-14 18:53:55 +00:00
Kegan Dougal
eae4da1e72 Parse space children/parents and upsert them in the database 2022-07-28 16:41:40 +01:00
Kegan Dougal
d19d32f6d5 refactor: factor out logic to pull bits of room state
e.g room type, tombstone status, and now space children.
2022-07-28 13:44:52 +01:00
Kegan Dougal
b9196db30b BREAKING(db): refactor how history is calculated
- Completely ignore events in the `state` block when processing
  sync v3 requests with a large `timeline_limit`. We should never
  have been including them in the first place as they are not
  chronological at all.
- Perform sync v2 requests with a timeline limit of 1 to ensure
  we can always return a `prev_batch` token to the caller. This
  means on the first startup, clicking a room will force a `/messages`
  hit until there have been `$limit` new events, in which case it
  will be able to serve these events from the local DB. Critically,
  this ensures that we never send back an empty `prev_batch`, which
  causes clients to believe that there is no history in a room.
2022-07-21 16:20:59 +01:00
Kegan Dougal
1380a71f80 bugfix: fix several issues which could cause corrupt state snapshots
A fundamental assumption in the proxy has been that the order of events
in `timeline` in v2 will be the same all the time. There's some evidence
to suggest this isn't true in the wild. This commit refactors the proxy
to not assume this. It does this by:
  - Not relying on the number of newly inserted rows and slicing the events
    to figure out _which_ events are new. Now the INSERT has `RETURNING event_id, event_nid`
    and we return a map from event ID to event NID to explicitly say which
    events are new.
  - Add more paranoia when calculating new state snapshots: if we see the
    same (type, state key) tuple more than once in a snapshot we error out.
  - Add regression tests which try to insert events out of order to trip the
    proxy up.
2022-06-08 18:20:10 +01:00
Kegan Dougal
3a5323d90f perf: load latest NIDs from the rooms table where possible 2022-04-26 12:00:06 +01:00
Kegan Dougal
17cc4e6ec1 perf: reduce the number of SQL queries further when pulling required_state 2022-04-25 20:35:27 +01:00
Kegan Dougal
c760def8e2 perf: Add tracing for basic request flow
Updating subs, extensions, processing lists, and live updates.
2022-04-25 18:21:11 +01:00
Kegan Dougal
dd6e6da50c Inject prev_batch values into timeline UserRoomData 2022-03-31 15:10:42 +01:00
Kegan Dougal
5dc1c38764 Add prev_batch column to events table
This will be used to return prev batch tokens to the client
on a best-effort basis.
2022-03-31 14:29:26 +01:00
Kegan Dougal
15a4b5a903 bugfix: gracefully handle seeing the same event in the timeline of a /sync response
Whilst this goes against the spec and is likely a Synapse bug, we still need to
work in the real world so handle it, log it, and add a regression test for it.
2022-03-18 10:59:38 +00:00