-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MIR] Dataflow framework, constant propagation and dead code elimination #35608
Conversation
(rust_highfive has picked a reviewer for you, use r? to override) |
pub fn new(num_bits: usize) -> BitVector { | ||
let num_words = u64s(num_bits); | ||
BitVector { data: vec![0; num_words] } | ||
} | ||
|
||
pub fn contains(&self, bit: usize) -> bool { | ||
let (word, mask) = word_mask(bit); | ||
(self.data[word] & mask) != 0 | ||
(self.data.get(word).cloned().unwrap_or(0) & mask) != 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that asking for an index that doesn't exist should panic and that this might mask bugs in other code. Do you use this behavior in your analysis?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that asking for an index that doesn't exist should panic and that this might mask bugs in other code.
It had such a behaviour for sizes which are not 0 mod 64
up to next 0 mod 64
bit already. This makes the behaviour consistent for all sizes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you use this behavior in your analysis?
I didn’t answer this: yes, I do. The dead code removing pass uses this behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see now. Yes it should be consistent at least :)
On Aug 11, 2016, at 4:14 PM, Simonas Kazlauskas [email protected] wrote:
In src/librustc_data_structures/bitvec.rs #35608 (comment):
pub fn new(num_bits: usize) -> BitVector { let num_words = u64s(num_bits); BitVector { data: vec![0; num_words] } } pub fn contains(&self, bit: usize) -> bool { let (word, mask) = word_mask(bit);
(self.data[word] & mask) != 0
It seems to me that asking for an index that doesn't exist should panic and that this might mask bugs in other code.(self.data.get(word).cloned().unwrap_or(0) & mask) != 0
It had such a behaviour for sizes which are not 0 mod 64 up to next 0 mod 64 bit already. This makes the behaviour consistent for all sizes.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/rust-lang/rust/pull/35608/files/0962575160c619b29df676c0fbb0cee3f804caf7#r74520803, or mute the thread https://github.com/notifications/unsubscribe-auth/AAc1nWMtSQhjePDC7v--c3TRfwBd_6YFks5qe6zdgaJpZM4Jinmf.
I only had time for a quick look, but it looks cool to me. I don't think you should be discouraged in anyway by my work on move up propagation -- for several reasons:
Does the dead code optimization ever fire? You could make SimplifyRewrite and CsPropgate separate, right? SimplifyRewrite does not appear to ever look at the lattice? We may want to put all the optimizations behind some command line option like |
The important part of having them together is the fact that they are interleaved. This is the major feature of this dataflow framework. The interleaving allows optimisation to not miss opportunities. Something that otherwise would be impossible by having these passes completely separate. For example consider a case like this: First constant propagation replaces
It does, indeed. The preceding constant propagation pass does not care about cleaning up after itself, so it falls onto dead code optimisation to clean up after it by removing all the extra assignments which end up getting unused.
We indeed might want to. I previously proposed to default
It does not make sense to dedicate manpower on making two equivalent (in their purpose) passes in two different ways when there’s already so many more, better, things to do. |
@nagisa @scottcarr Something that crossed my mind is that, well, "destination propagation" (aka "move up propagation" aka "backward move propagation") is orthogonal "source propagation" (aka "(forward) value propagation"). IIUC, dataflow being employed for value propagation (specifically to constants, and generally to moves), would be "source propagation", not "destination propagation" (correct me if I'm wrong @nagisa), so it could eliminate the single moves left by "destination propagation" reducing a move chain, e.g.: var0 = arg0;
tmp0 = var0;
tmp1 = tmp0;
var1 = tmp1;
tmp2 = var1;
return = f(tmp2); Destination propagation would likely be able to do this: tmp2 = arg0;
return = f(tmp2); But only source propagation (using dataflow analysis) could turn that into:
In fact, the testcase from @scottcarr's #34693 results in this: var0 = arg0;
return = (*var0); You'd need source propagation to reduce it to: return = (*arg0); |
let StatementKind::Assign(ref lval, ref rval) = s.kind; | ||
match *rval { | ||
Rvalue::Use(Operand::Consume(ref nlval)) => { | ||
lat.insert(lval, Either::Lvalue(nlval.clone())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, an Either::Lvalue is put in the lattice, but I don't see any place where an Either::Lvalue could be read out. Am I missing something?
I see that ConstRewrite uses Either::Const, but I don't understand what Either::Lvalue is for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was primarily for move propagation, but otherwise it is still a good idea to keep it Either::Lvalue
here, because in case of code like tmp1 = var1; <stuff> tmp1 = 0;
it would correctly pick up the fact that the tmp1 is a constant. This wouldn’t work with Top
because of how join
operation with ⊤ works. Just removing the lval from the map here doesn’t work either, because then, in a block with two incoming edges, there could be a case where ⊤ is not produced, when it really should be. Finally, just making a Either::Lvalue
(without the contained Lvalue) is also suboptimal, because you’d then have to produce Either::Top
every time two such values are joined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, scratch that, I think Either::Lvalue
stuff without inner data could be made to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to use Either::Lvalue
, you ought to invalidate it when the lvalue is re-assigned, because e.g.
let mut x = getit();
y = x; // y <- Either::Lvalue(x)
x = 1;
return y; // y must not be Either::Lvalue(x);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that move propagation, implemented in this framework, should not use a plain from from lvalues to lvalues - rather, it should used some some of data structure that holds equivalence classes, exactly for the invalidation reason you bring up.
@nagisa mostly covered this, but it is important to note that the interleaved transformation is more powerful than even an arbitrary number of iterations of the two passes seperately in any order. For example, in the following code, neither pass would be able to make any progress: let mut x = 0;
loop {
if x != 0 {
x = 1;
}
println!("{}", x);
} Constant propagation cannot see that let mut x = 0;
loop {
println!("{}", 0);
} |
self.values.remove(key) | ||
} | ||
fn top(&mut self, key: &Lvalue<'tcx>) { | ||
self.insert(key, Either::Top); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just remove from the map? With an intersection based join function, not present means Top
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this is a union-based lattice. No, missing hash table values default to Top
. But then, why do you have an fn remove
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is essentially my confusion over the lattice - the intersection based join function seems to conflict with all the code that treats removal and setting values to Top differently.
I think I may not be quite understanding what lattice you're trying to use for constant propagation and simplification. Here is the model that I am envisioning. For every lvalue, we want to track the state of what is stored in that location. There are three classes of value:
The join function is then performed pointwise, and joining with an uninitialized value does nothing. To concretely represent this lattice, we wish to use a To determine which is appropriate, we look at what lattice element should be used at the entry point of the function. Note that all arguments to the function must be at ⊤, since they are fully initialized, but could be anything. If they were instead ⊥, fn foo(mut bar: i32, baz: bool) {
if baz {
bar = 15;
}
println!("{}", bar);
} would be incorrectly optimized to fn foo(mut bar: i32, baz: bool) {
println!("{}", 15);
} since at the end of the static mut foo: i32 = 0;
unsafe fn bar(flag: bool) {
if flag {
foo = 1;
}
println!("{}", foo);
} would reduce to static mut foo: i32 = 0;
unsafe fn bar(flag: bool) {
if flag {
foo = 1;
}
println!("{}", 1);
} Relatedly, every static would have to have its state set to ⊤ after every function calls, which might arbitrarily mess with them. This is a tricky problem, as we would have to scan through the function to make sure that we were tracking every static that might be referenced. The one problem with ⊤ as the default value is that we can no longer actually represent bottom. This isn't necessarily, as the dataflow framework can be written to never call Therefore, I'm envisioning struct CsLattice {
map: WBottom<FnvHashMap<Lvalue, WBottom<ConstVal>>>
} with an intersection-based join function. Unfortunately, I'm a bit confused about what lattice and representation you're using for this PR. You're using an intersection-based join function, which makes "removed from the map" into the most top state that an lvalue can be in, yet you also have a separate |
} | ||
if unwind.is_some() { | ||
let mut unwind = lat.clone(); | ||
unwind.remove(location); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
location
ends up being value
in both cases - that's the point of DropAndReplace
. This could be vec![lat, lat.clone()]
.
@arielb1 @gereeter thanks for the comments!
That’s indeed true, however in case Does that make sense?
I certainly see why it could be confusing, because not only it was a last-minute idea, but I’m sure I forgot/missed some
In my experience that doesn’t work quite well for the same reason I described above. It is hard to construct a half-bottom lattice (where most of the lvalues are still I feel like it could also be somewhat unclear why distinction between a
|
I took some quick time to respond to the most important questions, but I haven’t time to rebase or tweak code to the comments; I shall do that on Tuesday. Thanks for the reviews again! |
This is fine - it is the bottom-by-default proposal I made, and makes perfect sense. However, it's not actually what you implemented. You wrote an intersection-based join function, which necessarily makes the default ⊤, since
I don't see any code that actually does this, and it definitely makes the map top-by-default.
Yes, this is a drawback of having top-by-default. However, I don't see why you want a half-bottom value - while you may want to bottom out individual values, this is as easy in either representation. The entry point to the function must be mostly-top (for reasons I argued before), and so the default will generally want to be top. That said, after thinking about this some more, I have a third solution which I think is just universally clearer and easier to work with: /// A `FullMap<K, V>` is a complete mapping from `K`s to `V`s - every `K` corresponds
/// to a `V`.
struct FullMap<K, V> {
// The value corresponding to every key not in exceptions
default: V,
exceptions: HashMap<K, V>
}
impl<K, V> FullMap<K, V> {
fn get<'a>(&'a self, key: &K) -> &'a V {
self.exceptions.get(key).unwrap_or(&self.default)
}
fn insert(&mut self, key: K, value: V) -> V {
if value == self.default {
self.exceptions.remove(&key).unwrap_or(value)
} else {
self.exceptions.insert(key, value).unwrap_or_else(|| self.default.clone())
}
}
}
impl<K, V: Lattice> Lattice for FullMap<K, V> {
fn bottom() -> Self {
FullMap {
default: bottom(),
exceptions: HashMap::new()
}
}
fn join(&mut self, other: &FullMap<K, V>) -> bool {
// This could definitely be optimized when defaults are known to be top or bottom
let old_default = self.default.clone();
let mut changed = self.default.join(&other.default);
for (k, v) in &mut self.exceptions {
if !other.exceptions.contains_key(&k) {
changed |= v.join(&other.default);
// TODO: for efficiency, remove keys here equal to the default
}
}
for (k, v) in &other.exceptions {
match self.exceptions.entry(k.clone()) {
Occupied(occupied) => {
changed |= occupied.get_mut().join(v);
if *occupied.get() == self.default {
occupied.remove();
}
},
Vacant(vacant) => {
let mut cur_v = old_default.clone();
changed |= cur_v.join(v);
if cur_v != self.default {
vacant.insert(cur_v)
}
}
}
}
changed
}
}
struct CsLattice {
inner: FullMap<Lvalue, WTopBottom<ConstVal>>
} |
The entry-point state of all variables should be ⊤, not ⊥. In fact, your code already basically does that. |
Okay, I ripped out any and all hacks I had in place wrt the lattice (most of the junk was a leftover from my experimentation with move propagation etc, which I didn’t clean up before). |
@@ -616,12 +616,14 @@ fn gather_moves<'a, 'tcx>(mir: &Mir<'tcx>, tcx: TyCtxt<'a, 'tcx, 'tcx>) -> MoveD | |||
Rvalue::InlineAsm { .. } => {} | |||
} | |||
} | |||
StatementKind::SetDiscriminant{ ref lvalue, .. } => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @arielb1 does this make sense to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SetDiscriminant
and borrowck are basically incompatible. Do not do these kinds of optimizations before borrowck runs.
Not anymore. |
|
||
passes.push_pass(box mir::transform::erase_regions::EraseRegions); | ||
|
||
passes.push_pass(box mir::transform::deaggregator::Deaggregator); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be after ElaborateDrops
, not before.
☔ The latest upstream changes (presumably #35162) made this pull request unmergeable. Please resolve the merge conflicts. |
@@ -63,6 +62,45 @@ impl BitVector { | |||
} | |||
} | |||
|
|||
/// Return and unset first bit set. | |||
pub fn pop(&mut self) -> Option<usize> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some unit tests would be nice :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There’s a test for this function introduced by this PR (but not necessarily in the commit you commented on).
This method is used by the liveness (dead code elimination) pass, but for some reason I never ended up staging it for commit
Previous version of the lattice was a mess from a number of experimets I’ve previously done with copy/move propagations and reviewers were sure to notice the mess. This cleans up the mess by ignoring anything related to non-constants. In addition, this commit adds propagation of constants through global mutable statics.
We can easily translate a SwitchInt with a Constant Operand, and it also helps us to constant propagate the thing.
Apparently our constant errors are only reported if there’s a use in MIR for that constant. This means that in these tests `let _ = B` is seen as a no-op use and thus is optimised out. Explicitly drop() the B to make sure it is not optimised (yet, until the inliner happens). NB: the `let _ = B` statement discussed here has no side effects (such as Drop implementation).
Replaces the previous hack
Also fix some more tests.
In debug mode if we optimise stuff out, then debug info also goes away. Do not optimise MIR in these destructive ways if `-g` flag and -Z mir-opt-level <= 1 (which now defaults to -C opt-level + 1).
Hi all, Yesterday I thought up of a class of nasty corner cases which, I think, makes raw pointers, MIR and this sort of (backward) dataflow framework incompatible without some sort of way to query aliasing information. fn main() {
let mut x = 4;
let y = &mut x as *mut i32;
x = 5;
unsafe ( assert_eq!(5, *y) };
} Produces MIR like this (read bottom-to-up):
Initially, I thought that only propagating locals (not projections) and completely ignoring locals once a reference of it is taken would be all right, but now it seems to me that it only works well with forward analysis, where you actually have a chance to find out about the reference before it may get used (which is what my response was to @arielb1 when he initially inquired about how aliasing gets handled). The part of the framework which handles speculative rewrites cannot quite handle this case either, because it only kicks in when there’s two conflicting pieces of knowledge from two incoming edges. In this particular case I feel that without the DeadCode pass, the PR would be too shallow to land, still. You’d have some sort of DF framework and be able to write some very simple forward passes, but nothing that reaches the complexity levels ( |
Something that got lost in the lattice refactor
A more difficult case is fn main() {
let mut x = 4;
let y = &mut x as *mut i32;
x = 3; // this assignment is not dead
unsafe ( assert_eq!(5, *y) };
x = 5; // and neither is this one
unsafe ( assert_eq!(5, *y) };
} I suppose that you would first need to forward-propagate which variables get their address taken. Maybe see what LLVM does? |
@arielb1 I wouldn't suggest looking at LLVM unless you want to implement MemorySSA for MIR. |
☔ The latest upstream changes (presumably #35979) made this pull request unmergeable. Please resolve the merge conflicts. |
@nagisa while I agree that we have to be conservative around these examples, I'm not quite sure why it's a problem. I guess I have to look a bit more at how you are handling things, but it seems to me that a deref of (e.g.) a raw pointer needs to be (conseratively) considered a use of (e.g.) any variable whose address has been taken. The same is true for various other operations (e.g., calls) where we can't say for sure what actions they will take. This is annoying -- and we should be clear about the rules we currently enforce -- but seems quite doable. Eventually we will want to be more precise but this also hinges on the unsafe code guidelines we wind up with, I imagine. |
According to Slava Pestov's notes on the Factor optimizing compiler, the optimistic assumption (starting at bottom) produces better code than the pessimistic assumption (starting at top). |
ping, what's the next step here? |
We need alias analysis of some sort before we can implement any of the more complex optimisations. Closing for now. |
Alias analysis sounds scary but all you need for most things is "two different non-ZST locals never occupy the same space" and if you want to go further, "a pointer value can't point to a local that was never borrowed". The latter is already part of @pcwalton's code, I believe. |
This is the 3rd take of the original dataflow framework PR (previous 2 being #34164 and #33628).
I lost a great amount of my motivation to work on this thing when I saw the opts we (me and @gereeter) already had semi-implemented worked on by the interns, so I decided to cut out most of the stuff from the framework not necessary for the other two optimisations I had in mind, so I could get this PR in landable state before 2020 or so.
No notable performance improvements to be seen here (LLVM is more than capable of doing constant propagation quickly and by itself), but peak resident memory rustc uses (~20MB reduction from 1.15GB) and, likely, metadata size receive a small reductions. All these passes are cumulative ~5 seconds for a bootstrap of whole stage (rustc + libstd). Thus, the primary benefit of this PR is a fix for #35316.
@gereeter sorry, I ended up not preserving the commits of your work, as the effort rebasing this work while also preserving commits ended up being greater than I had mood to deal with.
@eddyb ran a crater which found one regression which has been since fixed.
cc @eddyb @nikomatsakis @scottcarr