Skip to content

Conversation

@arjunr2
Copy link

@arjunr2 arjunr2 commented Jan 20, 2026

Initial PR for knobs for Config for record/replay feature

@arjunr2 arjunr2 requested review from a team as code owners January 20, 2026 17:15
@arjunr2 arjunr2 requested review from alexcrichton and removed request for a team January 20, 2026 17:15
@github-actions github-actions bot added wasmtime:api Related to the API of the `wasmtime` crate itself wasmtime:config Issues related to the configuration of Wasmtime labels Jan 20, 2026
@github-actions
Copy link

Label Messager: wasmtime:config

It looks like you are changing Wasmtime's configuration options. Make sure to
complete this check list:

  • If you added a new Config method, you wrote extensive documentation for
    it.

    Details

    Our documentation should be of the following form:

    Short, simple summary sentence.
    
    More details. These details can be multiple paragraphs. There should be
    information about not just the method, but its parameters and results as
    well.
    
    Is this method fallible? If so, when can it return an error?
    
    Can this method panic? If so, when does it panic?
    
    # Example
    
    Optional example here.
    
  • If you added a new Config method, or modified an existing one, you
    ensured that this configuration is exercised by the fuzz targets.

    Details

    For example, if you expose a new strategy for allocating the next instance
    slot inside the pooling allocator, you should ensure that at least one of our
    fuzz targets exercises that new strategy.

    Often, all that is required of you is to ensure that there is a knob for this
    configuration option in wasmtime_fuzzing::Config (or one
    of its nested structs).

    Rarely, this may require authoring a new fuzz target to specifically test this
    configuration. See our docs on fuzzing for more details.

  • If you are enabling a configuration option by default, make sure that it
    has been fuzzed for at least two weeks before turning it on by default.


Details

To modify this label's message, edit the .github/label-messager/wasmtime-config.md file.

To add new label messages or remove existing label messages, edit the
.github/label-messager.json configuration file.

Learn more.

Comment on lines 2480 to 2481
self.validate_determinism_conflicts()?;
self.enforce_determinism();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would you think of ignoring the determinism options here instead of validating/enabling them? Historically as various config options have been tied together it's caused problems and made configuration more confusing due to trying to understand how everything interacts with each other. In some sense producing a recording is entirely orthogonal to deterministic simd/nans. In another sense I also understand how such a recording runs the risk of not being too useful.

For the engine-level configuration I'd argue, however, that this is best kept as a separate concern where we'd document in the RR configuration options that users probably also want to turn on deterministic things, but it wouldn't be a requirement.

Such a change would also have the nice benefit of keeping validate as &self vs &mut self changed in this PR. That's been an intentional design so far where Config is intended to not need any sort of post-processing. If post-processing is necessary we try to defer it to "store the result of the computation in the Engine" so Config continues to reflect the source of truth of configuration specified.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd gently push back against this: determinism is a fundamental part of record/replay, not an optional add-on, and without it the replay may not only be incorrect but be incorrect in interesting ways that violate internal invariants (finding the wrong kind of event as we read the trace alongside canonical ABI steps, necessitating some sort of fallback that, I don't know, returns an error? aborts halfway through allocating something? forces a trap halfway through the marshalling code?). I'd personally see this in the same light as, e.g., the need for bounds checks when memory is configured a certain way: it's Just How We Compile Things.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't comment much on the intended direction of Config, but I agree that determinism is a fundamental part of RR, and feels like something that should be implicitly enforced whenever that option is specified.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear I'm not saying that things should be deterministic, I'm saying that I think it would make sense to avoid unconditionally coupling them here. The two options handled here, NaN bits and relaxed-simd, are pretty low down on the list of "things that practically cause nondeterminism" in wasm with resource exhaustion (stacks or memory) being much higher on that list. RR cannot control resource exhaustion during replay in the sense that it can't necessarily predict the stack consumption (maybe memory? that seems relatively advanced)

Basically I would expect that divergence of a replay from the original recording is something that's going to need to be handled no matter what. I think it'd be reasonable, for example, for the CLI to automatically set these options but at the wasmtime::Config layer we've generally had bad experiences tying all these together.

Put another way, I would be surprised if we could actually achieve absolutely perfect determinism during a record and replay. Inevitably it seems like we'll forget events, have bugs that prevent this, etc. Assuming perfect determinism to me sounds like it's going to introduce more subtle bugs than not and be a pretty steep uphill battle

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I think it's an interesting challenge to think about permitting partial divergence and recovering. However I also think, having seen and thought through a lot of the challenges here, that is enormously complex and opens a huge new Pandora's box of issues. For example, with reversible debugging, which builds on top of replay, the whole algorithm depends on determinism; we'll back ourselves into fundamentally unsolvable corners if we don't have it.

See also e.g. how rr (the Mozilla project) panics with internal asserts if trace replay mismatches. I think that's the only really reasonable way to go here: we'll have asserts when we have mismatches. In other words, yep there may be bugs; let's treat them as bugs and catch and fix them.

Resource exhaustion is of course a different category: early termination because a memory.grow failed on replay is reasonable to propagate through and we already have the error paths for that. The kind of nondeterminism that is impossible to deal with is the kind that keeps running but with a poisoned machine state.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another possible option here, which we've done elsewhere, is to change defaults depending on configuration options. For example the default set of enabled wasm features is different if you use Winch than if you use Cranelift.

One way to slice this problem, without mutating Config, would be to keep the validation that determinism isn't explicitly disabled and then update the read of these configuration values to take into account the rr configuration. That would retain the fact that validate doesn't mutate Config, but the configuration for reading "is relaxed simd determininstic" would look like "was it set or is rr enabled" or something like that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that could work too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any consensus then on the approach for this? Thinking about it a bit more, updating the value on reads could be misleading and requires all future uses of this to ensure it checks this edge case. Perhaps that's ok if it's stated explicitly in the code documentation somewhere?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say my personal requirement is that fn validate should stay at &self instead of changing to &mut self. How exactly that plays out for this can be workable in a few ways (e.g. decouple these options, change those reading the options, change the source-of-truth for the options, etc).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I missed that this was happening in validate and making it a &mut self method -- I definitely agree that that's incorrect / violates the intended meaning of validate.

In earlier review I had pointed to this bit of logic for guest debugging (idea courtesy of Alex) where we change the default knob settings (prior to processing user overrides) based on other knobs; all of this is happening in a way that doesn't actually mutate the Config, just changes the tunables.

It seems that the determinism settings are not on Tunables so maybe that's not literally applicable here but in principle we should either do that, or do what validate says on the tin and simply reject an invalid config, not silently mutate.

@arjunr2
Copy link
Author

arjunr2 commented Feb 2, 2026

Ok I've addressed everything from the prior review. In particular, for validate right now, I have disabled the implicit enforcement of determinism, requiring users to explicitly provide it. It will reject invalid configs that don't meet this with an error. Seems like it should be ok to place it on the user to set appropriate sister settings for the RR.

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I'll flag for merge after a rebase (merge conflicts currently)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

wasmtime:api Related to the API of the `wasmtime` crate itself wasmtime:config Issues related to the configuration of Wasmtime

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants