Since/next batch is an opaque string and might need to be urlencoded
before being sent to the server.
Signed-off-by: Guillem Nieto <gnieto.talo@gmail.com>
This properly propagates the go Context on down to all HTTP calls, which means that outgoing request have the OTLP trace context.
This also adds the Jaeger propagator to the list of OTEL propagators, so that Synapse properly gets the incoming trace context.
It also upgrades all the OTEL libraries
With regression test. The behaviour is:
- Delete the connection, such that incoming requests will end up with M_UNKNOWN_POS
- The next request will then return HTTP 401.
This has knock-on effects:
- We no longer send HTTP 502 if /whoami returns 401, instead we return 401.
- When the token is expired (pollers get 401, the device is deleted from the DB).
Features:
- Add `typing` extension.
- Add `receipts` extension.
- Add comprehensive prometheus `/metrics` activated via `SYNCV3_PROM`.
- Add `SYNCV3_PPROF` support.
- Add `by_notification_level` sort order.
- Add `include_old_rooms` support.
- Add support for `$ME` and `$LAZY`.
- Add correct filtering when `*,*` is used as `required_state`.
- Add `num_live` to each room response to indicate how many timeline entries are live.
Bug fixes:
- Use a stricter comparison function on ranges: fixes an issue whereby UTs fail on go1.19 due to change in sorting algorithm.
- Send back an `errcode` on HTTP errors (e.g expired sessions).
- Remove `unsigned.txn_id` on insertion into the DB. Otherwise other users would see other users txn IDs :(
- Improve range delta algorithm: previously it didn't handle cases like `[0,20] -> [20,30]` and would panic.
- Send HTTP 400 for invalid range requests.
- Don't publish no-op unread counts which just adds extra noise.
- Fix leaking DB connections which could eventually consume all available connections.
- Ensure we always unblock WaitUntilInitialSync even on invalid access tokens. Other code relies on WaitUntilInitialSync() actually returning at _some_ point e.g on startup we have N workers which bound the number of concurrent pollers made at any one time, we need to not just hog a worker forever.
Improvements:
- Greatly improve startup times of sync3 handlers by improving `JoinedRoomsTracker`: a modest amount of data would take ~28s to create the handler, now it takes 4s.
- Massively improve initial initial v3 sync times, by refactoring `JoinedRoomsTracker`, from ~47s to <1s.
- Add `SlidingSyncUntil...` in tests to reduce races.
- Tweak the API shape of JoinedUsersForRoom to reduce state block processing time for large rooms from 63s to 39s.
- Add trace task for initial syncs.
- Include the proxy version in UA strings.
- HTTP errors now wait 1s before returning to stop clients tight-looping on error.
- Pending event buffer is now 2000.
- Index the room ID first to cull the most events when returning timeline entries. Speeds up `SelectLatestEventsBetween` by a factor of 8.
- Remove cancelled `m.room_key_requests` from the to-device inbox. Cuts down the amount of events in the inbox by ~94% for very large (20k+) inboxes, ~50% for moderate sized (200 events) inboxes. Adds book-keeping to remember the unacked to-device position for each client.
We don't care about that as they never form part of the timeline.
Also, only send up a timeline limit: 1 filter to sync v2 when there
is no ?since token. Otherwise, we want a timeline limit >1 so we
can ensure that we remain gapless (else the proxy drops events).
- Completely ignore events in the `state` block when processing
sync v3 requests with a large `timeline_limit`. We should never
have been including them in the first place as they are not
chronological at all.
- Perform sync v2 requests with a timeline limit of 1 to ensure
we can always return a `prev_batch` token to the caller. This
means on the first startup, clicking a room will force a `/messages`
hit until there have been `$limit` new events, in which case it
will be able to serve these events from the local DB. Critically,
this ensures that we never send back an empty `prev_batch`, which
causes clients to believe that there is no history in a room.
We can do this now because we store the access token for each device.
Throttled at 16 concurrent sync requests to avoid causing
thundering herds on startup.
- Add `AccountDataTable` with tests.
- Read global and per-room account data from sync v2 and add new callbacks to the poller.
- Update the `SyncV3Handler` to persist account data from sync v2 then notify the user cache.
- Update the `UserCache` to update `UserRoomData.IsDM` status on `m.direct` events.
- Read `m.direct` event from the DB when `UserCache` is created to track DM status per-room.
- Add sync3.Response UnmarshalJSON() so we can dynamically make the
correct single/range op.
- Create sub-structs for sync2.Response to make inline embedding
easier in tests.