Skip to content

Move main LSP loop off Tokio to an owned thread#1277

Merged
lionel- merged 7 commits into
mainfrom
oak/salsa-33-write-deadlock
Jun 19, 2026
Merged

Move main LSP loop off Tokio to an owned thread#1277
lionel- merged 7 commits into
mainfrom
oak/salsa-33-write-deadlock

Conversation

@lionel-

@lionel- lionel- commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Branched from #1274
Progress towards #1212

Fixes a source of deadlocks when the main loop blocks on a Salsa write until it gains exclusive access, but background tasks hold a DB handle and are scheduled by Tokio to run on the same worker thread as the main loop. We could wrap writes in block_in_place() instead, which instructs Tokio to move all pending tasks to a different worker, but moving the main loop off tokio altogether is really the safer move.

From the documenting comment:

        // Since the main loop owns the Salsa DB and writes to it, we run on its
        // own thread instead of a Tokio worker. Salsa writes are potentially
        // blocking until the writer gains exclusive access. If background tasks
        // holding clones of the DB are stuck on the same thread as the main
        // loop, the LSP deadlocks. This can be avoided by wrapping writes in
        // `block_in_place()` but the safer structure is to have it run on an OS
        // thread that we're in control of.

Fixes a bunch of issues observed in frontend tests: https://github.com/posit-dev/positron/actions/runs/27755066106?pr=14254

We still use tokio for the auxiliary loop and the blocking worker pool, for the time being.

/// The returned [`LoopHandles`] owns everything the loops need. Drop it to
/// shut the loops down and release the owned state.
pub(crate) fn start(self) -> LoopHandles {
let mut aux = tokio::task::JoinSet::<()>::new();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really need a tokio::task::JoinSet anymore for 1 task?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a nice way of cancelling on abort. Dropping a task handle doesn't cancel it, so we'd need a custom Drop method to call the handle's abort() method.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropping a task handle doesn't cancel it

oh i was not aware of that

Comment on lines +342 to +345
/// Pull the next event off the channel. Only the test pump uses this; the
/// running loop selects on the channel directly so it can also watch for
/// shutdown.
#[cfg(test)]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would just remove this

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep it as a token of the nicer loop structure

Comment thread crates/ark/src/lsp/main_loop.rs Outdated
// handlers that mutate those still refresh explicitly.
let old_revision = salsa::plumbing::current_revision(&self.world.db);

lsp::log_info!("Processing next event");

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one doesn't feel that useful? We log at the beginning of every event type

Comment on lines +381 to +384
lsp::log_info!(
"Entering notification handler with {n} outstanding Salsa db holds",
n = self.world.db.outstanding_holds()
);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want to know this on every iteration? Why do we want to know this at all? It feels like itll be expected for the db to have outstanding holds in background threads (i.e. its working on diagnostics when we get a did-change update)

@lionel- lionel- Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not every iteration, this is only when we're about to write to the DB.

This is meant to help debugging hangs from user logs, we can remove this later once proven stable.

Comment thread crates/ark/src/lsp/main_loop.rs Outdated
Comment on lines +996 to +1000
loop {
let Some(task) = rx.recv().await else {
lsp::log_warn!("process_diagnostics_queue: channel closed, task exiting");
return;
};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the previous while look was pretty nice?

You'd just put the lsp::log_warn!("process_diagnostics_queue: channel closed, task exiting"); part after the while loop exits, i think?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point

.map(|task| (task.file.wire_url.clone(), task))
.collect();

tracing::trace!("Processing {n} diagnostic tasks", n = batch.len());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we don't say Generating diagnostics or Finished diagnostics via tracing::trace! (we only do that via lsp::log_info!), i feel like this isn't actually that useful on the kernel side (i know its not new from you, but maybe we can just remove it)

How do we decide where to log?

@lionel- lionel- Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to keep it there because diagnostics code logs at kernel level, so we get a section marker.

In general, the current split is that eventing and dispatch belong to the LSP, handling belongs to the kernel. This is a remnant from r_task()-heavy handling and we have nicer logging facilities on the kernel side (context, profiling).
But: when the LSP becomes standalone, everything moves to the LSP.

@lionel- lionel- force-pushed the oak/salsa-32-standalone-script-in-package branch from 7979c1d to c9eb298 Compare June 19, 2026 10:13
@lionel- lionel- force-pushed the oak/salsa-33-write-deadlock branch from 2816d7d to 9160639 Compare June 19, 2026 10:13
@lionel- lionel- force-pushed the oak/salsa-32-standalone-script-in-package branch from c9eb298 to 5c53c2f Compare June 19, 2026 14:39
Base automatically changed from oak/salsa-32-standalone-script-in-package to main June 19, 2026 14:39
@lionel- lionel- force-pushed the oak/salsa-33-write-deadlock branch from 45ef459 to dcc5514 Compare June 19, 2026 14:40
@lionel- lionel- merged commit f45da51 into main Jun 19, 2026
17 checks passed
@lionel- lionel- deleted the oak/salsa-33-write-deadlock branch June 19, 2026 14:42
@github-actions github-actions Bot locked and limited conversation to collaborators Jun 19, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants