Early versions of the proxy tended to send a list of event JSONs
and the "latest" NID for that batch. This then interacted badly
with later code which used these NIDs to determine if the event
in question should be returned to the client or not. We sometimes
filter them out in cases where the "initial" room has already
included this event, e.g room has msgs A,B,C which were pulled in
initially via a DB call, then we receive C down as an update, we
should not include it else we will send back A,B,C,C. By only
sending the "latest" NID, we will filter out other events in
that batch as they are <= the previously seen latest NID.
This was not tested in E2E tests because it relies on slow pollers
which cause >1 timeline event for a single room to arrive. This
may be a cause of flakey tests. We now have an integration test for
this which injected batches of events for the same room and ensures
they are all seen down the connection.
Thanks to @jplatte and @manuroe for helping debug this.
This allows you to send `timeline_limit: 1` in one request, then
swap to `timeline_limit: 10` in the 2nd request and get 10 events,
without it affecting the window (no ops or required_state resent).
This is being added to support fast preloading on mobile devices,
where timeline_limit: 1 is used to populate the room preview in the
room list and then timeline_limit: 20 is used to quickly pre-cache
a screen full of messages in case the user clicks through to the room.
Features:
- Add `typing` extension.
- Add `receipts` extension.
- Add comprehensive prometheus `/metrics` activated via `SYNCV3_PROM`.
- Add `SYNCV3_PPROF` support.
- Add `by_notification_level` sort order.
- Add `include_old_rooms` support.
- Add support for `$ME` and `$LAZY`.
- Add correct filtering when `*,*` is used as `required_state`.
- Add `num_live` to each room response to indicate how many timeline entries are live.
Bug fixes:
- Use a stricter comparison function on ranges: fixes an issue whereby UTs fail on go1.19 due to change in sorting algorithm.
- Send back an `errcode` on HTTP errors (e.g expired sessions).
- Remove `unsigned.txn_id` on insertion into the DB. Otherwise other users would see other users txn IDs :(
- Improve range delta algorithm: previously it didn't handle cases like `[0,20] -> [20,30]` and would panic.
- Send HTTP 400 for invalid range requests.
- Don't publish no-op unread counts which just adds extra noise.
- Fix leaking DB connections which could eventually consume all available connections.
- Ensure we always unblock WaitUntilInitialSync even on invalid access tokens. Other code relies on WaitUntilInitialSync() actually returning at _some_ point e.g on startup we have N workers which bound the number of concurrent pollers made at any one time, we need to not just hog a worker forever.
Improvements:
- Greatly improve startup times of sync3 handlers by improving `JoinedRoomsTracker`: a modest amount of data would take ~28s to create the handler, now it takes 4s.
- Massively improve initial initial v3 sync times, by refactoring `JoinedRoomsTracker`, from ~47s to <1s.
- Add `SlidingSyncUntil...` in tests to reduce races.
- Tweak the API shape of JoinedUsersForRoom to reduce state block processing time for large rooms from 63s to 39s.
- Add trace task for initial syncs.
- Include the proxy version in UA strings.
- HTTP errors now wait 1s before returning to stop clients tight-looping on error.
- Pending event buffer is now 2000.
- Index the room ID first to cull the most events when returning timeline entries. Speeds up `SelectLatestEventsBetween` by a factor of 8.
- Remove cancelled `m.room_key_requests` from the to-device inbox. Cuts down the amount of events in the inbox by ~94% for very large (20k+) inboxes, ~50% for moderate sized (200 events) inboxes. Adds book-keeping to remember the unacked to-device position for each client.
In preparation for migrating end-to-end style integration tests
to be actual end-to-end tests. The intended split is:
- Does the test exclusively use the public sliding sync API for test assertions?
- Does the test exclusively use the public sync v2 API for configuring the test?
If the answer to both questions is YES, then they should be end-to-end tests.
Some examples of this include testing core functionality of the API like
room subscriptions, multiple lists, filters, extensions, etc.
Some examples of tests which are NOT end-to-end tests include:
- Testing connection handling (e.g sending multiple duplicate requests)
- Ensuring outstanding requests get cancelled.
- Testing restarts of the proxy.
- Testing out-of-order responses.
- Benchmarks.
These all involve configuring the test / asserting different things, which would
be extremely difficult to reliably engineer using a real homeserver.