9 Commits

Author SHA1 Message Date
Kegan Dougal
b33e1d2b9f bugfixes: fix 2 issues which could cause invalid ranges to be returned
- When a range is shrinked from both ends, we would incorrectly send back
  INVALIDATE twice for the same range (the higher end). This was caused by
  improper usage of Go for .. range loops
- When a range is changed to a window completely outside the room list
  (e.g 5 rooms, range=[10,15]) we would incorrectly send back SYNC 10,15
  with a single room ID, the last room in the list. This was caused by
  boundary check code in SliceInto capping ranges to the last element if
  the index positions are > room list. However in this case, this causes
  the last room to be returned.

With regression tests (UT and E2E)
2023-01-03 13:39:46 +00:00
Kegan Dougal
be8543a21a add extensions for typing and receipts; bugfixes and additional perf improvements
Features:
 - Add `typing` extension.
 - Add `receipts` extension.
 - Add comprehensive prometheus `/metrics` activated via `SYNCV3_PROM`.
 - Add `SYNCV3_PPROF` support.
 - Add `by_notification_level` sort order.
 - Add `include_old_rooms` support.
 - Add support for `$ME` and `$LAZY`.
 - Add correct filtering when `*,*` is used as `required_state`.
 - Add `num_live` to each room response to indicate how many timeline entries are live.

Bug fixes:
 - Use a stricter comparison function on ranges: fixes an issue whereby UTs fail on go1.19 due to change in sorting algorithm.
 - Send back an `errcode` on HTTP errors (e.g expired sessions).
 - Remove `unsigned.txn_id` on insertion into the DB. Otherwise other users would see other users txn IDs :(
 - Improve range delta algorithm: previously it didn't handle cases like `[0,20] -> [20,30]` and would panic.
 - Send HTTP 400 for invalid range requests.
 - Don't publish no-op unread counts which just adds extra noise.
 - Fix leaking DB connections which could eventually consume all available connections.
 - Ensure we always unblock WaitUntilInitialSync even on invalid access tokens. Other code relies on WaitUntilInitialSync() actually returning at _some_ point e.g on startup we have N workers which bound the number of concurrent pollers made at any one time, we need to not just hog a worker forever.

Improvements:
 - Greatly improve startup times of sync3 handlers by improving `JoinedRoomsTracker`: a modest amount of data would take ~28s to create the handler, now it takes 4s.
 - Massively improve initial initial v3 sync times, by refactoring `JoinedRoomsTracker`, from ~47s to <1s.
 - Add `SlidingSyncUntil...` in tests to reduce races.
 - Tweak the API shape of JoinedUsersForRoom to reduce state block processing time for large rooms from 63s to 39s.
 - Add trace task for initial syncs.
 - Include the proxy version in UA strings.
 - HTTP errors now wait 1s before returning to stop clients tight-looping on error.
 - Pending event buffer is now 2000.
 - Index the room ID first to cull the most events when returning timeline entries. Speeds up `SelectLatestEventsBetween` by a factor of 8.
 - Remove cancelled `m.room_key_requests` from the to-device inbox. Cuts down the amount of events in the inbox by ~94% for very large (20k+) inboxes, ~50% for moderate sized (200 events) inboxes. Adds book-keeping to remember the unacked to-device position for each client.
2022-12-14 18:53:55 +00:00
Kegan Dougal
d529ed52d5 bugfix: honour the windows that the ranges represent to avoid mismatched DELETE/INSERT operations
Previously, we would just focus on finding _any_ window boundary and then
assume that was the boundary which matched the window for the purposes of
DELETE/INSERT move operations. However, this wasn't always true, especially
in the following case:
```
0..9 [10..20] 21...29 [30...40]
then move 30 to 10
0..9 [30,10...19] 20...28 [29,31...40]

expect:
 - DELETE 30, INSERT 30 (val=29)
 - DELETE 20, INSERT 10 (val=30)

but we would get:
 - DELETE 30, INSERT 20 (val=19)
 - DELETE 20, INSERT 10 (val=30)

because the code assumed that there was a window range [20,30] which there wasn't.
```
2022-09-05 15:58:56 +01:00
Kegan Dougal
1155d24314 bugfix: fixed a bug whereby a DELETE could specify an index of -1
This could happen with 1-length windows e.g `[0,0]` where an element
was moved from outside the range e.g i=5 to the window index e.g 0.
This then triggered an off-by-one error in the code which snapped
indexes to windows. Fixed with regression tests.
2022-08-25 20:48:38 +01:00
Kegan Dougal
0e6be68c74 Extend the functionality of CalculateMoveIndexes
Previously it would not consider jumping over multiple ranges
e.g [10,20],[30,40] then jumping from 50 to 0 results in 2x
pairs of move indexes as all rooms shift right, causing:
 - DELETE 20, INSERT 10 (room 9)
 - DELETE 40, INSERT 30 (room 29)

We now do consider this, with unit tests. This is not hooked up
to the main code however.
2022-06-09 13:10:11 +01:00
Kegan Dougal
51b3516043 Completey rework how we calculate moved indexes on live updates
- Removed clamping logic which was buggy due to it assuming a direction
  (items travel from high indexes to low indexes)
- Added new logic which is agnostic to how rooms move around in the room list.
- Add plenty of unit tests for the new move index calculations.
2022-05-31 17:02:46 +01:00
Kegan Dougal
07bdc7fd41 bugfixes: don't reset timeout when new events arrive; fix delta algo logic when there are no changes
With regression tests.
2022-04-13 11:32:49 +01:00
Kegan Dougal
63e0eda761 Use a new range delta algorithm
Previoulsy the algorithm was very naive and just did literal delta checks.
For example, with an initial window of `[[0,20],[30,40]]` and a new window
of `[[0,20],[25,35]]` the algorithm would calculate that `[25,35]` has been
added and `[30,40]` has been removed. This isn't entirely correct because
`[30,35]` are the same among the two windows. The inability of the algorithm
to detect this meant more rooms than necessary would be invalidated/synced
when slowly scrolling through the room list.

The new sweep line algorithm is O(nlogn) due to sorting and can accurately
detect partial overlaps. This means now only the trailing edge is INVALIDATEd,
and only the leading edge of the window is SYNCed. This is clearly visible in
the client visualisations.
2022-02-25 18:48:23 +00:00
Kegan Dougal
e20a8ad067 Move synclive to sync3 2021-10-05 16:22:02 +01:00