Rust's Sneaky Deadlock With `if let` Blocks

(brooksblog.bearblog.dev)

129 points by lukastyrychtr 5 days ago | 84 comments

Clippy already has an error for this pattern with Mutex. It should be trivial to extend it to cover RwLock.

    error: calling `Mutex::lock` inside the scope of another `Mutex::lock` causes a deadlock
      --> src/main.rs:5:5
       |
    5  |       if let Some(num) = *map.lock().unwrap() {
       |       ^                   --- this Mutex will remain locked for the entire `if let`-block...
       |  _____|
       | |
    6  | |         eprintln!("There's a number in there: {num}");
    7  | |     } else {
    8  | |         let mut lock2 = map.lock().unwrap();
       | |                         --- ... and is tried to lock again here, which will always deadlock.
    9  | |         *lock2 = Some(5);
    10 | |         eprintln!("There will now be a number {lock2:?}");
    11 | |     }
       | |_____^
       |
       = help: move the lock call outside of the `if let ...` expression
       = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#if_let_mutex
       = note: `#[deny(clippy::if_let_mutex)]` on by default

the_duke a day ago | root | parent |

There are a lot of alternative lock implementations that are used all the time, with the most common ones probably being tokios RwLock/Mutex and parking_lot.

That lint won't help for those.

GolDDranks a day ago | root | parent |

There was an idea floated for another lint `#[diagnostics::lint_as_temporary(reason = "...")]`, which is supposed to be added by implementors of such locks. https://github.com/rust-lang/rust/issues/131154#issuecomment...

sunshowers 2 days ago | prev | next |

This is going to be fixed in Rust 2024:

https://github.com/rust-lang/rust/issues/124085

dgfitz 2 days ago | root | parent |

Did they just miss this in the spec?

sunshowers 2 days ago | root | parent |

From my understanding: The original implementation is consistent with the behavior of `match`. However it was realized that it is both less intuitive than early dropping (as this post suggests) and also gets in the way of understandable semantics for if let chains, so the decision was made to change it in the next edition.

match still retains the old behavior.

acjohnson55 2 days ago | root | parent |

Why does the behavior make sense in match?

GolDDranks 2 days ago | root | parent | next |

Match has multiple branches, and some or all of them can bind variables, (borrowed from the value matched against), defined in that branch. For that to be possible, it means that the temporary lifetime of the matched value must encompass all the branches.

Compared to that, if let - else has only two branches: one where matching (and possible variable binding) happens, and one where it doesn't. You couldn't have anything borrowed from the matched value in the else branch, so the lifetime extending to encompass the else branch is not strictly necessary.

NobodyNada 2 days ago | root | parent |

Also, from a pure syntax perspective, an if let block reads like it creates a binding that exists within the scope of the braces after the if, and a match block reads like it creates a binding that exists within the "scope" of the braces surrounding the entire match block.

These all read the same to me from a scoping perspective:

    if let Some(x) = expr {
        // expr is live here
    }

    while let Some(x) = expr {
        // expr is live here
    }

    match expr {
        // expr is live here
        Some(x) => {},
        _ => { /* expr is still live, doesn't matter that x is inaccessible */
    }

    fn foo(x: i32) {
        // same idea, the variable x belongs to the scope it's declared alongside
    }

For the scope of an if-let expression to extend into the else block afterwards violates the principle of least surprise for me -- and clearly the author of the OP too!

    if let Some(x) = expr {
        // expr is live here
    } else {
        // ??? why is expr live here? the if is clearly "out of scope!"
    }

sunshowers 2 days ago | root | parent | prev |

https://github.com/rust-lang/rust/issues/131154 has some discussion about all the possibilities. It's pretty subtle, but I think everyone agrees that the current behavior of if let is unintuitive. Beyond that there are a bunch of possibilities none of which is clearly superior.

GolDDranks 2 days ago | root | parent |

https://github.com/rust-lang/rust/issues/131154#issuecomment... Especially this matrix was helpful! The behaviour that's going to be adopted in Rust 2024 is the Rust2024.1 one. They were having doubts about the rules becoming too complex/non-uniform, but at least dropping early means compiler errors rather than deadlocks.

nlitened a day ago | prev | next |

While I never dove deep into Rust, until now I have kinda naively expected for some reason that Rust's lifetimes and ownership model prevents these trivial deadlocks at compile time. Thinking about it now, there's no reason why it would. Still lots of footguns.

Also, from my experience, acquiring and releasing a mutex multiple times within a single code path feels to me like a smelly, subtly faulty code. Are there legitimate cases when it is inevitable and correct?

SkiFire13 a day ago | root | parent | next |

> I have kinda naively expected for some reason that Rust's lifetimes and ownership model prevents these trivial deadlocks at compile time.

A `Mutex`/`RwLock` is doing borrow checking at runtime, effectively replacing the compile time one.

jvanderbot a day ago | root | parent |

Correct. Refcell is also an area where you can happily create footguns.

If you'd like to avoid the guardrails, you still need to be disciplined.

> I have kinda naively expected for some reason that Rust's lifetimes and ownership model prevents these trivial deadlocks at compile time.

It does in a way, as you'd generally do multi-threading using message passing (for which the borrow checker can verify thread-exclusive data access) instead of manual locking.

gpderetta a day ago | root | parent |

This my periodic reminder that it is trivial to deadlock with message passing as well as with mutexes. But a couple of order of magnitude harder to debug.

simonask a day ago | root | parent | prev |

Yeah, intuitively Mutex is actually one of the tools that you use to negotiate with the borrow checker (it's one implementation of "interior mutability", allowing conversions from immutable to mutable borrows by taking the lock), so it is kind of by definition not something Rust can prevent, outside of lints and other static analysis.

I can't think of a use case for taking a mutex lock multiple times in a code path, but there is a common cache pattern with RwLock, where you have a "fast path" that only takes a read lock to perform a lookup, and then only take a write lock when inserting a new entry. In this case, you should be using an upgradable read lock, so there is technically a "re-lock" of the same RwLock on upgrade in the same codepath. This model does prevent deadlocks because of move semantics.

Animats 2 days ago | prev | next |

A lock taken in the conditional test term should lock for the duration of the if-clause. If you release the lock early, you add the risk of a race condition, where some other thread executes between the test and the execution of the if or else clause, changing the state.

kelnos a day ago | root | parent | next |

Right. But the problem is that with an 'if let', it also locks for the 'else', which is not intuitive or expected.

The problem is that the syntax of 'if let' makes the programmer assume a particular scoping, but its desugaring implies something else.

Right - if the pattern in the referenced blog post is used to implement lazy init, racing threads can double initialize.

But it's not possible to upgrade a read to write lock as racing threads can cause a deadlock, and upgradable RwLock variants don't hand out multiple upgradable read locks.

To maintain concurrent access, the referenced code must recheck the data once a write lock is granted, to gracefully handle TOCTOU.

(I know you were making a general point, I'm just fleshing out this specific example)

But in Rust, the lock protects the contents inside it and lends the contents only for the lifetime the lock is being held, so if one tries to modify the contents when the lock is already released, you are going to get a compiler error, not a race condition.

gmueckl 2 days ago | root | parent |

Parent was referring to a time of check to time of use bug. This kind of problem occurs every time you need to release and reacquire a lock between the point of decision and the point where the data is modified. The decision on which the modification is based may have been invalidated by another thread while the lock was released.

Animats 2 days ago | root | parent | next |

Right. I could have been clearer.

There are two errors to avoid here:

- Read lock, read some data, unlock, relock for writing, act on data read during previous locked period. That's a time of check/time of use error.

- Acquire a read lock, then upgrade to a write lock. MutexLock does not let you do that. If you do that from two threads, you can deadlock. You can do that in SQL, but SQL can back out a transaction that deadlocks.

This is not a problem with Rust. Rust prevented the author from shooting themself in the foot.

GolDDranks 2 days ago | root | parent | prev |

I was going to say "d'oh, that's locking 101", but the original article indeed does that kind of a thing with the else branch. They should have chosen a better example do demonstrate the point, but it's hard to say if the current racy behaviour is even wrong because they didn't have any real-life semantics/use-case in the example...

colanderman 2 days ago | root | parent | prev |

This is true, but it's also true that a lock ought only be held in the lexical scope of its guard's let binding. That the scope of temporaries is extended outside the binding scope in this one case is surprising.

In a review I'd flag code which depended on this behavior -- it's much clearer instead to simply take the lock outside the if statement.

peterkelly a day ago | prev | next |

One way to solve this at the language level would be to require that each lock has a specified priority associated with it, and have the compiler enforce a constraint that locks can only be obtained in the order of their priority.

For each statement, the compiler could keep track of the current "lock priority". If an attempt is made to obtain a new lock with a lower or equal priority, the compiler would reject this. Otherwise, the compiler would use the priority of the new lock as the lock priority for all statements within the scope of the block in which that lock is held.

Aside from modifying the language, the same basic idea could be implemented at the library level by exposing a lock type which still requires a priority to be specified at creation time, but tracks the current lock priority (per thread) at runtime and panics when the constraint is violated. Since the panics would be deterministic, they and the book-keeping required to track the current lock priority could be enabled only in debug mode.

senorrib 2 days ago | prev | next |

Is there any explicit reason as to why RWLock can’t simply try to upgrade the read-lock to write-lock given they’re both being held by the same thread?

anderskaseorg 2 days ago | root | parent | next |

Yes, allowing this to execute would be very unsound:

    let lock = RwLock::new(Box::new(111));
    let r: &i32 = &**lock.read().unwrap(); // points to 111
    *lock.write().unwrap() = Box::new(222); // allocates a new Box and deallocates 111
    println!("{}", *r); // use after free

NobodyNada 2 days ago | root | parent |

It can be done safely with an upgrade method that requires an owned read guard. The RwLock implementation provided by the parking_lot crate supports this safely:

    let lock = RwLock::new(Box::new(111));
    let read = lock.upgradable_read();
    let r: &i32 = &**read; // points to 111
    *RwLockUpgradableReadGuard::upgrade(read) = Box::new(222); // error[E0505]: cannot move out of `read` because it is borrowed
    println!("{}", *r);

anderskaseorg 2 days ago | root | parent | next |

Of course (but that’s not relevant to the original scenario, where the programmer is hypothetically not aware that the read lock is still being held, let alone that they could manually upgrade it after changing to a different lock library).

kelnos a day ago | root | parent | prev |

The problem is that in the 'if let' case, the 'else' block has no access to the read guard. It's out of scope, except that the compiler hasn't dropped it yet.

webkike 2 days ago | root | parent | prev |

Would be much more expensive to implement. RwLocks are basically Atomics wrapping a pointer and have no knowledge of who has acquired the lock

Sytten 2 days ago | prev | next |

Happy to be corrected, but from what I gathered RWLock should mostly be avoided in favour of a simple Mutex unless you have a very read-heavy usecase and even then the performances are subpar compared to mutex.

kccqzy 2 days ago | root | parent | next |

I agree with you, but the problem the article is talking about is unexpected lifetime with `if let` which is not just limited to RwLock.

maybe but that is unrelated to the distilled problem shown in the blog and RWLocks are not the only way this can bite you

but it's also behavior for rust match-like statements in general so nothing new nor specific to if let

The weirdest part about rwlock (not rust-specific) is when you end up using it in a case where the writes are the shared case and the reads are the exclusive case.

roland35 2 days ago | root | parent | prev |

I like to use rwlock on my code which reads once every few hours... Maybe I should change that!

Kab1r 2 days ago | prev | next |

I would call this a compiler bug

nindalf a day ago | root | parent |

It’s getting fixed soon - https://github.com/rust-lang/rust/issues/124085

Kab1r a day ago | root | parent |

If the change is targeting a rust edition it's being treated as a feature/improvement not as a compiler bug. If there isn't a spec that explicitly says that this is expected behavior (and as far as I know there isn't a concrete spec for the rust language) it should be instead treated as a backwards compatible bug fix

kelnos a day ago | root | parent | next |

No, it's being treated as a breaking change, which it is. It is not a backwards-compatible bug fix.

You are correct that there is no concrete spec for the Rust language; the current state of the compiler and stdlib is the "spec". So this is a breaking change to the "spec", and requires a new edition.

KwanEsq a day ago | root | parent | prev |

It's very much not backwards compatible, that's why they are doing it in an Edition.

a day ago | prev | next |

[deleted]

dmitrygr 2 days ago | prev |

This is why pthread_mutex_lock() and pthread_mutex_unlock() will always be kings - it is clear when things are locked and when they are unlocked. Nobody needs to write articles warning you about pthread_mutex being taken randomly by syntactic sugar and not released till a magic later time due to the same sugar. Ditto for RWlock/XYZloc/ABClock/etc...

sunshowers 2 days ago | root | parent | next |

I think mutexes that own their data are genuinely much easier to reason about than mutexes that don't.

In this case this is a Rust bug that will be fixed in the next edition: https://github.com/rust-lang/rust/issues/124085

ajross 2 days ago | root | parent |

That's arguably true, but only insofar as it makes sense to talk about a mutex "owning" "data". There are many kinds of data that can't be owned by the language runtime like this (think about async/shared/DMA buffers, register blocks on foreign hardware, memory-mapped database files), but that still clearly need synchronization. Even things like coarse-grained locks taken at the subsystem level (e.g. to avoid having a separate atomic in every little object) are a bad fit for this kind of API.

There's value to being opinionated about API design, but there's also cost. And Rust is supposed to be playing in the same sandbox as the lower level tools.

nindalf a day ago | root | parent | next |

Here’s what I understand from your comment, correct me if I’m wrong. You’re saying there is a runtime cost associated with a mutex/rwlock owning the data. And secondly, you think it isn’t possible to implement this pattern in other areas that need locking.

I don’t think either of those is true. Encoding the ownership in the type system makes things clearer, imposes a compile-time cost but not run-time cost. Also, there isn’t any magic in the stdlib implementation of Mutex and Rwlock other than implementing it natively for every OS. This means that it is possible to implement the pattern for the examples you gave.

That is true, but I don't believe Rust imposes a cost here. The worst case degrades to a `Mutex<()>` and then whatever's being guarded being managed separately.

simonask a day ago | root | parent |

From an API design point of view, I would always encapsulate the logically mutable state in some type `Foo`, even if it doesn't literally own the memory, and then expose a safe API using `Mutex<Foo>`.

Idiomatically, you would then make `Foo::new_unchecked()` unsafe, with the precondition that nobody else is accessing the same external resource, and that `Foo::new_unchecked()` is only ever called once.

tialaramex a day ago | root | parent | next |

Right. I advise people who don't think that anything should be owned by the lock make a marker type - call it for example struct MacGuffin - with no data in it, just to mark that we took the lock. Firstly they can now write APIs which express ideas like "You can't call this unless you have the lock" (that's a MacGuffin parameter) but then also very often they discover actually there is mutable data that they only want when they've taken the lock, and hence have MacGuffin, and hey, we can just put it in the MacGuffin type.

So long as MacGuffin doesn't have any data in it, it essentially evaporates at runtime. This is a zero size type (ZST) so we can store that in no bytes of RAM, we can use no registers at all to pass it into a function, and so on. If you're a C programmer this doesn't make sense because you don't have ZSTs, all your types take up at least one byte, but Rust isn't like that.

sunshowers a day ago | root | parent | prev |

Yep, it's nice to use marker types for this kind of thing.

You can also use a mutex that doesn't own data in Rust if you need to.

I'm not convinced, because you're conflating too many things here. Ownership and borrow checking are always compile time concepts. There is no language runtime involved. If you think of C as a magic language then start today and get that idea out of your head.

If locking and unlocking semantics are a good fit for whatever you are doing, then you can just do a lock implementation setting whatever memory mapped registers you need. There is no constraint here whatsoever.

Nobody is advocating for separate atomics in every little object in Rust. That's a Java idea that failed miserably long before Rust was even around.

a day ago | root | parent | prev |

[deleted]

There are however many PRs about people ignoring the return values of pthread_mutex_lock(), or missing an unlock on one route through a function.

dathinab 2 days ago | root | parent |

and nothing prevents you from explicitly dropping lock guards in rust either

actually it's a common pattern in complicated multi threaded code or similar to require that

freeone3000 2 days ago | root | parent | next |

Dropping a lock guard in rust requires you to have already dropped the borrow — the shared memory under lock is only accessible through the guard, and you can’t drop an object if you have a reference to it. This is not true in C, where you can reference the shared memory after the lock is released.

dathinab a day ago | root | parent |

> This is not true in C, where you can reference the shared memory after the lock is released.

technically yes but practically this is nearly always programmer bug to bypass a lock

freeone3000 a day ago | root | parent |

That is the point — this is (another) example of a programmer bug that C allows that Rust does not.

The rust compiler (or the rust-analyzer) directs developers to use mutexes or other guards, even in single-threaded code when the use case wouldn't strictly require it. Where we would get undefined behavior in C, we get deadlocks in rust. This forces developers to handle these cases, but this can be quite painful to implement, especially in embedded or no_std environments.

dlahoda a day ago | root | parent |

may be it directs to use unsafe?

dathinab a day ago | root | parent |

no not at all

If you port C code to rust and now need much more locks in Rust it's nearly always an indicator for either your C code not having been correct (e.g. accessing resources behind a lock without acquiring it) and/or you having badly structured code.

There is basically hardly ever a need to use unsafe for such cases as long as you don't write your own fundamental data structures (e.g. specific kinds of idk. lock free concurrent data structures).

Honestly the main reason I see unnecessary unsafe code usage is people trying to write C or C++ code in rust instead of writing rust in rust.

kelnos a day ago | root | parent | prev |

So what? If you drop the lock guard, you get a straightforward compiler error if you later try to access the data in it. You don't get a runtime bug that you have to track down like you get if you forget a pthread_mutex_unlock().

dathinab a day ago | root | parent |

yes that is the point

you can rely on implicit drop to always make sure you never forget to unlock (as long as you don't leak the guard which is hard to do but possible)

but you can also explicitly drop it to have full control over when exactly it will unlock

and in neither situation you have to worry about accidentally bypassing it

and in complicated code relying implicitly on a mutex or similar to still be kept it's a common pattern to do so (i.e. to explicitly drop the mutex guard even if not needed to be extra clear about for how long exactly the lock is held)

No thanks; it's so so so easy to forget to unlock a mutex; I'd rather the language and stdlib just make it impossible for me to do so.

Ultimately this isn't really a problem with how Rust's Mutex/RwLock/etc. works, it's just a poor choice with how the lifetime of the lock guard is figured in the 'if let' case. This poor choice will be fixed in the 2024 Rust edition, and this problem will go away.

nikki93 a day ago | root | parent | next |

I think a 'linear types' ish model where the compiler flags an error if you didn't write the explicit unlock call, and only compiles if some acceptable unlock call (or set of calls) is added, would be a good design. It can / would also prevent use-after-consume. I do want the static checks, but I don't think that means that implicit function calls need to be generated on a } or a = (assigning to a variable causes a drop) etc. This is what Rust already does with `.clone()` -- needing the explicit call -- and I think it makes sense for a lot of cases of implicit drop. I have seen discussions about implementing this for Rust and I have it implemented in a C borrow checker experiment I'm trying. ATS is also an example of this, and going further is something like Frama-C or seL4 and so on.

The main point being: the implicitness is not necessary for "compiler makes sure you don't forget". So the original comment about how usage of the explicitly named and paired APIs can clarify intent both for the writer and reader can still stand while not implying that forgetting is involved. I see this dichotomy often being drawn and I think it's important to consider the language design space more granularly. I think a reminder is a better fix for forgetting, rather than assuming what you wanted and doing it for you.

(the explicit calls also let you / work better with "go to defintion" on the editor to see their code, learn what to look up in a manual, step in with a debugger, see reasonable function names in stack traces and profiles, pass / require more arguments to the drop call, let the drop call have return / error values you can do something about (consider `fclose`), let the drop call be async, ...)

tucnak a day ago | root | parent | prev |

https://gaultier.github.io/blog/perhaps_rust_needs_defer.htm...

In rust, this is y=x.lock(), and later drop(y) or just end the scope block.

Sugary syntax is always an issue when side effects matter.

Having said that, I adore procedural code in general, but it does make ownership a fun mental exercise.

This really reads like satire but feels like the author is actually being sincere. I’ve seen so many deadlocks happening due to pthread mutexes. Certainly way more than this syntactic corner case. It’s so common in fact that I saw a codebase that made their mutexes recursive to avoid having to think about it.

dmitrygr 2 days ago | root | parent |

[flagged]

kelnos a day ago | root | parent | next |

Deadlocks are a sign of normal humans getting tripped up by crappy abstractions.

If you're going to claim you've never written a deadlock in your life, well... I'm not sure I'd believe that.

Ideally alcohol would also be banned. Obviously that doesn’t work because of enforcement challenges so we shouldn’t ban gambling either. But that doesn’t mean we need to make it super easy.

Who hurt you?

techbrovanguard 2 days ago | root | parent | prev |

this is pretty obvious bait

[deleted]

No way. The compiler doesn't help you.

Plus you're working in a memory unsafe, concurrency unsafe language.

akira2501 2 days ago | root | parent | prev |

I see it as a consequence of Rust being too liberal with additions to the language. It's hard to maintain some of these guarantees in the presence of certain syntactic forms. This happens in all languages.

It is a little hilarious to see a rustaceon write:

> and you can sus out if a program is going to cause a deadlock by just making sure you aren't acquiring multiple simultaneous locks.

Ah.. not a problem! You just have to not ever have the problem. Then, of course at the opposite end of the article:

> I wrote this block because this has specifically bitten different Rust crates in the wasmCloud project multiple times

dathinab 2 days ago | root | parent | next |

The issue here isn't new features at all.

`if let` is more or less syntax sugar for `match`.

And match behaves like that, too and has been in rust since 1.0.

That match did behaves like that is due to some old, you could say legacy, reasons and had been criticized even in the early rust 1.x days.

But changing a behavior which subtle change when locks are released is not something you can easily fix with a rust edition so we are pretty much stuck with it.

Philpax 2 days ago | root | parent | next |

Funny you say that: https://github.com/rust-lang/rust/issues/131154

Looks like this is getting fixed in Rust 2024 :)

dathinab 2 days ago | root | parent |

I haven't been supper up to date with changes to rust-lang recently (I had seen that they are stabilizing if_let_rescope but didn't look into what it is, to much to do at work).

Through making if-let less syntax sugar for changing a implicit behavior people most likely didn't rely on but have problems with seems like a very good idea

My comment about this being hard to change was mainly about match. Not considering the option of making if-let less syntax sugary.

And in difference to if-let, for match people do (or at least did years ago in production code) rely on it.

To elaborate a bit more this is a consequence of a overlapping of two design decisions:

1) temporaries being added "alongside" the item they appear in. This makes a tone of thing much much simpler, but comes back to bites us here.

2) "alongside" for match statement meaning alongside the whole statement (but for `if <cond> {` it's alongside the condition)

3) things being always dropped at the very end of the scope if not moved out from it earlier, which again makes things easier to understand in most cases.

both had been discussed a bunch around 1.0/early 1.x days and both are things which in most situations make it easier to write rust code (and for beginners potentially much easier)

but both have also drawbacks

like the not-that-common example in this blog

or e.g. in async where rust has to keep any values which impl Drop around across async await calls as it can't know if there is a side effect in Drop

In the past I personally had been a contender of allowing the compiler to drop value anywhere between the last time they have been referenced and the end of the scopes without any rules or stability about where exactly (i.e. if you need a guard to be kept around you need to be explicit about it).

But working more together with people of very varying skill levels in the last 5/6 years made me change my mind and agree that that would have been a terrible idea.

And having rules about guaranteed drops as early as possible seem initially easy but aren't due to things like conditional moves, partial moves etc. I.e. it would still be quite a bit more complicated to teach it.

Furthermore in both alternatives to 3) you likely still wouldn't (guaranteed) drop the guard temporary in the other match branch before you requesting the new guard as guaranteeing compiler behavior like that means having a lot of additional edges in many partial move scenarios and potentially even a bunch of additional branch. In both cases it would likely increase code size and mess with the branch predictor and I-caches and be generally just not good (but it would help with async await boundaries).

akira2501 2 days ago | root | parent | prev |

> `if let` is more or less syntax sugar for `match`.

Which are new features for systems languages that otherwise rely on RAII for locking. So it's in the class of "original sin."

> so we are pretty much stuck with it.

You could refuse to compile it under some set of flags. Isn't that the basic value premise of the language here?

sunshowers 2 days ago | root | parent |

It is true that Rust is the first major language with both pervasive, deterministic RAII and pattern matching. Kind of wild to be honest! A real marriage of systems and functional code.

2 days ago | root | parent | prev |

[deleted]