perf: speed up B4 local editing ~2.4x and snapshot import ~45% by zxch3n · Pull Request #1033 · loro-dev/loro

zxch3n · 2026-06-25T07:06:08Z

Summary

Speeds up the B4 (automerge-paper) workload on both axes: local text editing and fast-snapshot import. Measured on an Apple M5 Pro (rustc 1.96, release):

	before	after	Δ
apply 1× (local edit, 260K actions)	112 ms	~46 ms	−59% (2.4×), 11.5 M op/s
apply 100×	11.5 s	~5.25 s	−54%
import (B4 snapshot)	135 µs	~78 µs	−42%
import (B4×100, 22 MB)	8.15 ms	~4.4 ms	−46%

Snapshot bytes stay byte-identical throughout; all loro / loro-internal / mergeable tests pass and the fuzz corpus replays clean.

Changes

Local editing

Compile the lock-order debug instrumentation (LoroMutex) out of release builds — it ran on every per-op OpLog+DocState acquire/release (~30% of edit time). can_lock_in_this_thread returns false in release, backed by the now-exact cached visible-op count; the order checks still run in debug/tests.
Bump visible_op_count incrementally for local ops instead of recomputing it from the version vectors every op (the old path also heap-allocated an im::HashMap iterator each call).
Avoid the per-op visited Vec allocation in DocState::is_deleted (inline SmallVec).
Build the position-context error string in checked_range_end lazily; return entity ranges in a SmallVec.
Route the per-insert event-index computation through the existing cursor cache.
Plain-text fast path: for style-free text (non-wasm, unicode index) entity_index == event_index == pos, so the read phase (cursor location + two visit_previous_caches walks + styles lookup) is skipped, and the delete path skips its index_to_event_index walks. Falls back to the general path when styles are present, on wasm, or for other position types.
Gate apply_local_op's txn/doc context check (a per-op Weak::upgrade) to debug builds.

Snapshot import

Skip the redundant per-block SSTable checksum on full import — the whole snapshot body is already covered by the document-level checksum verified in parse_header_and_body, so this removes a second hash pass over the data.

Infra

Vendor generic-btree (maintained by loro-dev) into crates/generic-btree and redirect via [patch.crates-io], so the b-tree can evolve in-tree. This is a verbatim vendoring of 0.10.7 (most of the line count in this diff) — the build is transparent.
Add crates/examples/examples/b4_bench.rs, a phase-timed B4 harness.

Validation

cargo test -p loro-internal --lib (279), cargo test -p loro (all suites), mergeable_container / mergeable_cid_encoding, import_atomicity, kv-store sstable. New regression tests for the cached visible-op-count and the block-checksum skip.
cargo +nightly fuzz run all corpus replay: clean.

Not included / future

Reaching diamond-types-level throughput (~2 ms for this trace on the same machine) would require a plain-text-specialized path that drops the rich-text entity/style + 5-coordinate cache for style-free text, plus deferred b-tree cache propagation — a larger structural change. The vendored fork is in place to enable that work.

🤖 Generated with Claude Code

Local text editing (applying the automerge-paper trace): ~112ms -> ~65ms. - Compile the lock-order debug instrumentation out of release builds; it ran on every per-op OpLog+DocState lock acquire/release (~30% of edit time). In release `can_lock_in_this_thread` returns false, backed by the now-exact cached visible op count. - Bump `visible_op_count` incrementally for local ops instead of recomputing it from the version vectors (which also heap-allocated an im::HashMap iterator) on every op. - Build the position-context error string in `checked_range_end` lazily (no per-op alloc) and return entity ranges in a SmallVec (no per-delete Vec alloc). - Route the per-insert event-index computation through the existing cursor cache instead of a fresh `visit_previous_caches` walk every op. Snapshot import (fast snapshot): B4 ~135us -> ~80us; B4x100 (22MB) ~8.15ms -> ~4.5ms. - Skip the redundant per-block SSTable checksum on full import; the whole body is already covered by the document checksum verified in parse_header_and_body. Adds crates/examples/examples/b4_bench.rs (phase-timed B4 harness) plus regression tests for the cached visible op count and the block-checksum skip. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

`is_deleted` allocated a fresh `visited` Vec on every local op (the #1 allocation source after the earlier fixes: ~260k allocs on the B4 trace). Parent chains are shallow (depth 1 for a root container), so use inline SmallVec storage. apply 1x: ~65ms -> ~61ms. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Fork crates.io generic-btree 0.10.7 (which loro-dev maintains) into crates/generic-btree and redirect all dependents via [patch.crates-io], so the b-tree can evolve in-tree (e.g. deferred cache propagation). This is a verbatim vendoring of 0.10.7 (build is transparent: B4 apply unchanged at ~62ms); only the manifest is trimmed (benches dropped, dev-deps reduced to what the in-src tests need). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add a specialized insert/delete path for style-free text on the attached, non-wasm, unicode-index path (the common Rust text-editing case). When the richtext has no style anchors, entity_index == event_index == unicode pos, so the entire read phase -- cursor location, two `visit_previous_caches` coordinate walks, and the styles lookup -- is unnecessary; `apply_local_op` then locates the cursor exactly once. The delete path likewise skips the two `index_to_event_index` walks. Falls back to the general path when styles are present, on wasm, or for non-unicode position types, so results are unchanged (snapshot bytes identical; loro, loro-internal lib, and mergeable tests all pass). Also gate `apply_local_op`'s txn/doc context check (a per-op `Weak::upgrade`) to debug builds, since the handler always passes its own doc. Cumulative B4 apply: 112ms -> ~46ms (~2.4x), ~11.5 M op/s. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-25T07:12:41Z

WASM Size Report

Original size: 3029.43 KB
Gzipped size: 999.71 KB
Brotli size: 701.20 KB

Address two correctness regressions introduced by the B4 perf work: 1. The txn/doc context check in `Transaction::apply_local_op` was gated to debug builds. `insert_with_txn`/`delete_with_txn` are public API, so a caller can feed one document's transaction to another document's handler; in release that silently stamped the target doc's state/oplog with the wrong peer+counter instead of returning `UnmatchedContext`. Restore the check for all builds using a cheap `Weak`-pointer comparison (no atomic upgrade on the hot path; upgrade only to fill in the error on mismatch). 2. `MemKvStore::import_all` (re-exported publicly via loro-crdt) dropped per-block checksums for all callers. Split the API: public `import_all` (and `SsTable::import_all`) always verifies block checksums; a new `import_all_unchecked` opts into the fast path and is used only by Loro's snapshot decode (`ChangeStore::import_all`, `KvWrapper::import`), where the document-level checksum from `parse_header_and_body` already guarantees integrity over the whole body. Adds regression tests: `cross_doc_txn_is_rejected` and the updated `sstable_import_block_checksum_only_skipped_when_unchecked`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

zxch3n and others added 4 commits June 25, 2026 12:23

zxch3n and others added 2 commits June 25, 2026 15:49

chore: make fuzz sanitizer platform-aware

574a388

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

perf: speed up B4 local editing ~2.4x and snapshot import ~45%#1033

perf: speed up B4 local editing ~2.4x and snapshot import ~45%#1033
zxch3n wants to merge 6 commits into
mainfrom
perf/b4-local-edit-and-import

zxch3n commented Jun 25, 2026

Uh oh!

github-actions Bot commented Jun 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Uh oh!

Conversation

zxch3n commented Jun 25, 2026

Summary

Changes

Local editing

Snapshot import

Infra

Validation

Not included / future

Uh oh!

github-actions Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

WASM Size Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Jun 25, 2026 •

edited

Loading