-
Notifications
You must be signed in to change notification settings - Fork 13.5k
Make create_def
a query
#142711
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Make create_def
a query
#142711
Conversation
3d8ce06
to
31b00c7
Compare
@bors2 try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Make `create_def` a query Creating a new definition using `create_def` currently behaves as a red node for incremental compilation: the caller query must be recomputed at the next run. This is suboptimal. If a query's dependencies have not changed, we'd like the query engine to re-create the definitions and continue replaying the dependency graph and load cached data. This idea falls short with a subtle global state dependency: when creating definitions, we must ensure that `DefPathHash` are globally unique, and carry a disambiguator global state around. Because of this, a call to `create_def` may change its result `DefPathHash` due to global state, and force the caller query to be recomputed. If that happens, the caller query would need to be recomputed, but must not re-create the definitions that the query engine created for it. This PR attempts to solve this issue by using a `create_def_raw` query. This query is `eval_always`, so it can safely read global state. The query engine sees its return value as the `DefPathHash`, so will detect if it unexpectedly changed. We benefit from the caching queries do. However, we want multiple successive calls to `create_def` from the same queries to result in a new definition each time. For instance when creating anonymous definitions that are only identified by their parent and a disambiguator. To allow this, we add 2 extra parameters to `create_def_raw` that are unique to each call by the caller query: the query's `DepNode` and a counter of calls from within that query. Consider this example. Some query A calls create_def twice, we generate calls to: 1. `create_def(CRATE_DEF_ID, Use) = create_def_raw(CRATE_DEF_ID, Use, A, 0)` This returns `::{use#0}`. 2. `create_def(CRATE_DEF_ID, Use) = create_def_raw(CRATE_DEF_ID, Use, A, 1)` The query has different arguments, so returns a brand new `::{use::1}`. 3. `create_def(CRATE_DEF_ID, Impl) = create_def_raw(CRATE_DEF_ID, Impl, A, 2)` This returns `::{impl#0}`. When replaying, if nothing changed, the query engine will call `create_def_raw` thrice, and `A` will be green. If another query created a definition with `(CRATE_DEF_ID, Impl)`, `::{impl#0}` is taken. The 3rd call to `create_def` needs to use another disambiguator, that call to `create_def_raw` returns `::{impl#1}` and be correctly red. `A` needs to be re-computed. The definitions for `use`s have already been created when walking the dep-graph. The re-computation calls use the results in `create_def_raw` cache, aka `::{use#0}`, `::{use#1}` and `::{impl#1}`, and we do not have duplicate definitions. r? `@oli-obk` cc `@Zoxc` <!-- homu-ignore:start --> <!-- If this PR is related to an unstable feature or an otherwise tracked effort, please link to the relevant tracking issue here. If you don't know of a related tracking issue or there are none, feel free to ignore this. This PR will get automatically assigned to a reviewer. In case you would like a specific user to review your work, you can assign it to them by using r? <reviewer name> --> <!-- homu-ignore:end -->
What happens if during a future incr comp session the first definition goes away? Will the second definition still use |
I'm not sure I fully grasp the question. The first definition going away means that |
Never mind. I assumed though you were literally using the index you pass to |
This comment has been minimized.
This comment has been minimized.
I haven't looked at the PR too closely yet, but it seems to be a different direction than what me and @oli-obk were heading in #t-compiler > query side effects: create_def. This seems to revert parts of #140453 by adding back global state which could possibly cause non-deterministic def paths for the parallel compiler. |
I agree this is a different direction. (TBH, I'm not sure I fully understand the direction you are taking in that thread.) The problem is so difficult that I think it worth experimenting with several competing schemes until we are convinced we have a satisfying one. This PR is heavily inspired from that zulip discussion's first message, and salsa entity scheme. I don't agree this PR could introduce non-determinism. The manipulation of untracked state, including disambiguators, is done in |
The nondeterminism is that two threads invoking it at the same time get arbitrary disambiguators back. This means running the compiler twice with parallel rustc produces different results. Soundness of the query system is not the issue, reproducability is. This also applies to incremental getting different results if forcing causes them to create def ids in different orders than in previous runs |
Is that even possible? Uniqueness of the DefPathHash is a global property, so disambiguation must be a global mutable state. We can limit usage and reduce likelihood, but we cannot get rid of it entirely. So if two threads can create definitions, the only way to make compilation reproducible is to ensure a reproducible order between those two threads. |
The way we solved this is to disambiguate different queries that create DefIds by using new and different DefPathData. Of course if we screw up and produce the same one twice we get an ICE, but then we just go fix that by adding a new variant. The combination of different parent plus different DefPathData should always be uniquely specifyable. I don't see where it would not be. E.g. for the RPITITs we count its position in the return type of its parent function to add an integral disambiguator. Parent + Rpitit DefPathData + index is always unique |
Finished benchmarking commit (5c08747): comparison URL. Overall result: ❌ regressions - please read the text belowBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary 1.7%, secondary 2.4%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary 2.1%, secondary 3.8%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 691.846s -> 691.623s (-0.03%) |
Creating a new definition using
create_def
currently behaves as a red node for incremental compilation: the caller query must be recomputed at the next run. This is suboptimal.If a query's dependencies have not changed, we'd like the query engine to re-create the definitions and continue replaying the dependency graph and load cached data. This idea falls short with a subtle global state dependency: when creating definitions, we must ensure that
DefPathHash
are globally unique, and carry a disambiguator global state around. Because of this, a call tocreate_def
may change its resultDefPathHash
due to global state, and force the caller query to be recomputed. If that happens, the caller query would need to be recomputed, but must not re-create the definitions that the query engine created for it.This PR attempts to solve this issue by using a
create_def_raw
query. This query iseval_always
, so it can safely read global state. The query engine sees its return value as theDefPathHash
, so will detect if it unexpectedly changed. We benefit from the caching queries do.However, we want multiple successive calls to
create_def
from the same queries to result in a new definition each time. For instance when creating anonymous definitions that are only identified by their parent and a disambiguator. To allow this, we add 2 extra parameters tocreate_def_raw
that are unique to each call by the caller query: the query'sDepNode
and a counter of calls from within that query.Consider this example. Some query A calls create_def twice, we generate calls to:
create_def(CRATE_DEF_ID, Use) = create_def_raw(CRATE_DEF_ID, Use, A, 0)
This returns::{use#0}
.create_def(CRATE_DEF_ID, Use) = create_def_raw(CRATE_DEF_ID, Use, A, 1)
The query has different arguments, so returns a brand new::{use::1}
.create_def(CRATE_DEF_ID, Impl) = create_def_raw(CRATE_DEF_ID, Impl, A, 2)
This returns::{impl#0}
.When replaying, if nothing changed, the query engine will call
create_def_raw
thrice, andA
will be green.If another query created a definition with
(CRATE_DEF_ID, Impl)
,::{impl#0}
is taken. The 3rd call tocreate_def
needs to use another disambiguator, that call tocreate_def_raw
returns::{impl#1}
and be correctly red.A
needs to be re-computed. The definitions foruse
s have already been created when walking the dep-graph. The re-computation calls use the results increate_def_raw
cache, aka::{use#0}
,::{use#1}
and::{impl#1}
, and we do not have duplicate definitions.r? @oli-obk
cc @Zoxc