Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CIR][Dialect] Alloca address space verification #672

Closed
wants to merge 2 commits into from

Conversation

seven-mile
Copy link
Collaborator

This PR ensures the result address space of alloca matches the expectation of data layout.

@seven-mile
Copy link
Collaborator Author

seven-mile commented Jun 9, 2024

Inspired by this commit.

It's useful because builder.createAlloca in CIR does not consider address spaces, but CreateAlloca in LLVM IR Builder does.

AllocaInst *CreateAlloca(Type *Ty, unsigned AddrSpace,
Value *ArraySize = nullptr, const Twine &Name = "") {
const DataLayout &DL = BB->getModule()->getDataLayout();
Align AllocaAlign = DL.getPrefTypeAlign(Ty);
return Insert(new AllocaInst(Ty, AddrSpace, ArraySize, AllocaAlign), Name);
}
AllocaInst *CreateAlloca(Type *Ty, Value *ArraySize = nullptr,
const Twine &Name = "") {
const DataLayout &DL = BB->getModule()->getDataLayout();
Align AllocaAlign = DL.getPrefTypeAlign(Ty);
unsigned AddrSpace = DL.getAllocaAddrSpace();
return Insert(new AllocaInst(Ty, AddrSpace, ArraySize, AllocaAlign), Name);
}

There are two versions, one for free address space specification, which means this check is not compulsory. But at this early stage of address space support, we should be glad to have it.

It also makes it easier to track the absence of getASTAllocaAddressSpace in #671 , which basically returns the address space from data layout.

@orbiri What do you think? ; )

@@ -13,6 +13,7 @@
#include "clang/CIR/Dialect/IR/CIRDialect.h"
#include "clang/AST/Attrs.inc"
#include "clang/CIR/Dialect/IR/CIRAttrs.h"
#include "clang/CIR/Dialect/IR/CIRDataLayout.h"
Copy link
Collaborator Author

@seven-mile seven-mile Jun 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the horrible changes. I have to include ::cir::CIRDataLayout here. CE because ::mlir::cir and ::cir are ambiguous.

a per commit view

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I get this, if you start with :: it's usually enough to remove the ambiguity, so referring to ::cir::CIRDataLayout should be enough, while keeping the rest as is. Also, for things like this, where unrelated changes are a bit massive, they should be prep PRs - I still think you don't need these in this specific case though. What did you tried and what didn't work?

Copy link
Collaborator Author

@seven-mile seven-mile Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed I can specify ::cir for CIRDataLayout. But then it's original codes like void cir::CIRDialect::initialize() being ambiguous that cause a bunch of compilation errors.

I think it's because C++ lookups names hierarchically and failed in the unqualified name cir already without consider there's actually nothing called ::cir::CIRDialect. You may try solving this minimal repro individually ; )

I also tried to come up with an approach that does not touch all usage of cir. I can also implement AllocaOp::verify() in another .cpp file (even a new one). But the current changes has the least disadvantages (if have).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea I'm trying to convey is that we should do the best for a specific file, instead of workarounds that don't take the big picture into consideration. CIRDialect.cpp is mainly about implementing stuff under mlir::cir, therefore that namespace should be the one taking priority in this file.

Changed you repro to this: https://godbolt.org/z/zfnE6o67r, like I suggested above, and it works for me. So not really sure what you are talking about. It's possible that if even so we get into problems, might be worth deleting the using namespace mlir; which is a bit aggressive, but I think we can workaround this restriction in the way described in https://godbolt.org/z/zfnE6o67r

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed I can specify ::cir for CIRDataLayout. But then it's original codes like void cir::CIRDialect::initialize() being ambiguous that cause a bunch of compilation errors.

☝️ , there are already many cir::-qualified symbols in the TU (mixed with ::mlir::cir and mlir::cir), so just including the data layout header breaks it due to the ambiguities. I played around a little but also didn't find a solution that works without re-qualifying at least some of the symbols in the file.

Copy link
Member

@bcardosolopes bcardosolopes Jun 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like using namespace mlir; is a bit too much for this file. Looks like we'll need some rewriting anyways, but I'd argue in favor of keeping using namespace mlir::cir; and fixing the pure ::mlir missing ones. I'm in favor of such change, but not as part of this PR, should be its own PR.

My impression though is that by removing the verification bits we probably won't need to look at CIRDataLayout anymore in CIRDialect.cpp.

@orbiri-ns
Copy link

Very nice!
Do we specify in the op description that the address space of the returned pointer must match the address space specified in the DL? Please add it if not :)

LogicalResult AllocaOp::verify() {
auto modOp = getOperation()->getParentOfType<ModuleOp>();
// If no data layout info specified, just skip it
if (!mlir::dyn_cast_or_null<mlir::DataLayoutSpecAttr>(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have other parts of the code using this pattern of getting the DL?
I wonder if we can create some unified interface implementation since we don’t need every new contributor to find the best method to get the DL for inspection :)
I’m not sure where it should be defined (in the dialect? In this file as static method? In another file?) but it for sure can help future us! 😇

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you are overcomplicating things a bit, a mlir::DataLayout layout(op->getParentOfType<mlir::ModuleOp>()); should suffice here. Datalayout is assumed to be available.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that DataLayout is not available in the test cases from test/CIR/IR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sitio-couto is going to introduce (a) a triple thing for CIR to compute datalayout and (b) the bits for it to work in some cases in cir-opt (i.e. when AST isn't available). If you need to test something ABI related you should check it coming from the source, not from standalone CIR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is what I expected before. But to get current tests not failed in this verification, we have to skip the situations of missing DataLayout. What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • AllocaOp does not need an attribute of addspace, because we pass the result pointer type to its constructor, which already includes one.
  • In clang, CreateAlloca seems to never specify individual addrspace. Instead, the addrspace is fetched from the target data layout.
  • We could not mimic such approach immediately, because there's no actual need for "non-default (non-0) alloca address space" in CIR. (SPIR-V target does not need it.) I don't think it's a good idea to add this feature now.
  • But to better track this missing feature, and also better track the tricky issue about getASTAllocaAddressSpace() in PR of SPIR-V target, I want to introduce this verification at early stage of CIR. It makes sure that we can find the root cause immediately after emitting incorrect AllocaOps.
  • LLVM IR once introduced the same check in this commit, but removed now. We may also follow such a pace in the future.

It's closely associated with SPIR-V target. More details in this comment.

That's why it's a verifier, rather than an enhancement in CIRBuilder. Hope my expression is clear enough😃.

But I think you're right. Op::verify is for the operation itself, just like we do not use Op::fold to do match-and-rewrite. Do we have alternative mechanisms in MLIR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, so CIR's AllocaOp always carries the address space through its return type, in contrast to the LLVM alloca which has this rule:

If the address space is not explicitly specified, the object is allocated in the alloca address space from the datalayout string.

Couldn't we just propagate the address space that clang determines (or is given in a lit test) through CIR? At LLVM emission time, we could check if the address space matches the LLVM module's default, and if not, set it explicitly on the alloca instruction.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the missing feature you want to protect against? Do we need something like "default address space number is not 0" or so?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alloca is generated for local(automatic) variables, or compiler builtin function. The AST information from the frontend is VarDecl int x;, which does not carries pointers unfortunately.

Yes, currently buildAlloca always returns pointers with addrspace(0). But it's not blocking SPIR-V target if I'm not mistaken.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, so CIR's AllocaOp always carries the address space through its return type, in contrast to the LLVM alloca...

This is what I'm trying to suggest as well.

Couldn't we just propagate the address space that clang determines (or is given in a lit test) through CIR?

Yep, exactly my point.

At LLVM emission time, we could check if the address space matches the LLVM module's default, and if not, set it explicitly on the alloca instruction.

I'd say that it should behave like alignment: override if specified, otherwise use DL.

What is the missing feature you want to protect against? Do we need something like "default address space number is not 0" or so?

Good point.

Seems like there's a lot of conversation around numbers regarding address space here and in the other PR. MLIR provides us enough resources to design decent abstractions to cope with silly restrictions, and also good to rememeber that CIR doesn't need to do exactly like LLVM does in number assumptions, we are free to be expressive in our representations (enum kinds, extra declarations, etc).

@@ -13,6 +13,7 @@
#include "clang/CIR/Dialect/IR/CIRDialect.h"
#include "clang/AST/Attrs.inc"
#include "clang/CIR/Dialect/IR/CIRAttrs.h"
#include "clang/CIR/Dialect/IR/CIRDataLayout.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I get this, if you start with :: it's usually enough to remove the ambiguity, so referring to ::cir::CIRDataLayout should be enough, while keeping the rest as is. Also, for things like this, where unrelated changes are a bit massive, they should be prep PRs - I still think you don't need these in this specific case though. What did you tried and what didn't work?

LogicalResult AllocaOp::verify() {
auto modOp = getOperation()->getParentOfType<ModuleOp>();
// If no data layout info specified, just skip it
if (!mlir::dyn_cast_or_null<mlir::DataLayoutSpecAttr>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you are overcomplicating things a bit, a mlir::DataLayout layout(op->getParentOfType<mlir::ModuleOp>()); should suffice here. Datalayout is assumed to be available.

}

// TODO: do not initialize every time
::cir::CIRDataLayout dataLayout{modOp};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not initialized every time, it probably uses value semantics like the rest of MLIR

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const CIRDataLayout getDataLayout() const {
// FIXME(cir): instead of creating a CIRDataLayout every time, set it as an
// attribute for the CIRModule class.
return {theModule};
}

I used to read these lines and wanted to track it likewise. But if you confirm that it's the right way to use this class, I'm willing to remove the TODO ; )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can drop that comment

@jopperm
Copy link
Contributor

jopperm commented Jun 12, 2024

It's useful because builder.createAlloca in CIR does not consider address spaces, but CreateAlloca in LLVM IR Builder does.
[...]
That's why it's a verifier, rather than an enhancement in CIRBuilder.

I haven't understood the rationale yet, sorry! Trying to condense the discussion above, is there anything preventing us from doing the following (as a new PR, to prepare for #671)

  • Extend CIRDataLayout with the default alloca AS and query it from CIRBuilder (analogous to OG codegen)
  • Propagate the address space through CIR
  • Emit LLVM alloca with address space if needed, similar to alignment

@seven-mile
Copy link
Collaborator Author

seven-mile commented Jun 13, 2024

We could not mimic such approach immediately, because there's no actual need for "non-default (non-0) alloca address space" in CIR. (SPIR-V target does not need it.) I don't think it's a good idea to add this feature now.

I used to insist that I have no way to test the DL query in CIRGen, it shall always emit zero still. But I do have a way to test the verification (like invalid.cir in this PR). I'm assuming such enhancement with no actual need and no observable test cases could hardly be accepted, by previous experience. Therefore I stated "I don't think it's a good idea to add this feature now". These two PRs are a big workaround against it.

But of course, I'll be glad to follow the direct and perfect approach as Julian suggested, if the community prefers it more. And that's the first prototype I developed actually😂. When submitting it, I found it may not pass the review and managed to get this workaround. Sorry to take up your time, it's still a constructive discussion overall😉. Closing as not planned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants