Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CIR][CodeGen][LowerToLLVM] End-to-end implementation of offload_* cases for OpenCL with SPIR-V target #724

Merged
merged 1 commit into from
Jul 26, 2024

Conversation

seven-mile
Copy link
Collaborator

@seven-mile seven-mile commented Jul 7, 2024

This PR implements offload_* cases discussed in this thread.

  • Integrate target-specific CIR-to-LLVM address space map into TargetLoweringInfo
  • CIRGen: Implement these cases in getValueFromLangAS
  • Lowering: Extend the state of type converter with LowerModule

When frontend provides a new LangAS like opencl_generic, it would be processed by CIRGenTypes and Builder.getPointerTo() and encoded as offload_generic.

When we lower CIR to LLVM,

  • For pointer types without address space attribute, it's mapped to ptr addrspace(0) directly
  • For target cases target<x>, it's mapped to ptr addrspace(x)
  • For other defined cases, query the target info with a new virtual method getTargetAddrSpaceFromCIRAddrSpace. General targets like X86 and ARM64 map all known cases to 0. For SPIR-V target here, it maps offload_generic to addrspace(4).

Copy link
Collaborator Author

@seven-mile seven-mile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sitio-couto Several ABI-related refactor and new code paths. I'll appreciate it if you can take a look!

@@ -59,6 +59,15 @@ bool ItaniumCXXABI::classifyReturnType(LowerFunctionInfo &FI) const {
return false;
}

class AppleARM64CXXABI : public ItaniumCXXABI {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AppleARM64CXXABI is used because Nathan introduced macOS host in the test driver.c. When we create LowerModule, the CXXABI is created meanwhile, leading to NYI. It's simple so I just add it here for completeness.


// FIXME: No actual usage of this rewriter, just to make the ctor happy.
mlir::PatternRewriter rewriter{&getContext()};
LowerModule lowerModule = createLowerModule(module, rewriter);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All LowerModule stuff is unpolished. @sitio-couto and I reached an agreement to defer the unification of the states like LowerModule. All the info needed here is the CIR AS map in TargetLoweringInfo, which is trivial enough. It's just for future convenience to define rewriteAddrSpace method in LowerModule.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's trivial enough why we need a specific pass?

Copy link
Member

@bcardosolopes bcardosolopes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is doing different things and moving code while at it.

I'd like to see this in more incremental and well explained pieces (e.g. SPIRV skeleton can be added separately, you also don't provide any description about this address map stuff, a sudden change to struct type which indirectly increases memory size of every single struct allocation, etc).

The approach also feels a bit over engineered, see my questions inline. If you have refactoring to do, please do them separately.

address space, which means it's not yet converted by the address space map
to carry target-specific semantics.
The address space attribute is used in pointer types. It essentially models
`clang::LangAS` rather than the LLVM address space.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps change It essentially models clang::LangAS rather than the LLVM address space to It essentially provides an unified models on top of clang::LangAS, rather than LLVM address spaces

@@ -35,11 +35,14 @@ struct StructTypeStorage : public TypeStorage {
bool packed;
StructType::RecordKind kind;
ASTRecordDeclInterface ast;
// Extra unique key to differentiate replacements on mutable parts
int replaceVersion;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This struct change sounds super adhoc and specific, doesn't fit the overall picture. Please elaborate why we need this and what are you trying to do, so perhaps we can find a different solution?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. The whole AttrTypeSubElementHandler was taken from LLVMStructType, which is necessary for pure C++ types as such to support walk and replace infrastructure in MLIR -- It's necessary for the AS lowering pass: recursivelyReplaceElementsIn to work.

/// Allow walking and replacing the subelements of a LLVMStructTypeStorage key.
template <>
struct AttrTypeSubElementHandler<LLVM::detail::LLVMStructTypeStorage::Key> {
static void walk(const LLVM::detail::LLVMStructTypeStorage::Key &param,
AttrTypeImmediateSubElementWalker &walker) {
if (param.isIdentified())
walker.walkRange(param.getIdentifiedStructBody());
else
walker.walkRange(param.getTypeList());
}
static FailureOr<LLVM::detail::LLVMStructTypeStorage::Key>
replace(const LLVM::detail::LLVMStructTypeStorage::Key &param,
AttrSubElementReplacements &attrRepls,
TypeSubElementReplacements &typeRepls) {
// TODO: It's not clear how we support replacing sub-elements of mutable
// types.
if (param.isIdentified())
return failure();
return LLVM::detail::LLVMStructTypeStorage::Key(
typeRepls.take_front(param.getTypeList().size()), param.isPacked());
}
};

But they left a TODO comment for the mutable case. I thought an extra tracing key would work, so gave it a try.

@@ -41,6 +41,8 @@ std::unique_ptr<Pass> createGotoSolverPass();
/// Create a pass to lower ABI-independent function definitions/calls.
std::unique_ptr<Pass> createCallConvLoweringPass();

std::unique_ptr<Pass> createAddrSpaceLoweringPass();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need a separate pass to do address space lowering, I don't see any good reason why it shouldn't be done directly in CIRGen.

Copy link
Collaborator Author

@seven-mile seven-mile Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it can be done in CIRGen immediately. Did you mean the TypeConverter of DirectToLLVM? I also prefer that way and already implemented it previously. If we use TypeConverter instead, we can also get rid of recursivelyReplaceElementsIn and replaceVersion (the handler should still be fixed though, can be another patch). It just bother a bit ConvertCIRToLLVMPass::runOnOperation to maintain the states required by LowerModule, and a lazy initialization logic for LowerModule as we do not always have triple present. If it's okay, let me update the changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just bother a bit ConvertCIRToLLVMPass::runOnOperation to maintain the states required by LowerModule

Can you elaborate a bit what would be needed specifically? Would it require a hacky solution or just some extra work?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. This is the patch looks like:
https://gist.github.com/seven-mile/c57ddfb96e180af1609d429e1e66f67b

Apart from messing up ConvertCIRToLLVMPass::runOnOperation, it doesn't look too hacky IMO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Thoughts:

  • The lazy initialization of the lowering state requiring synchronization doesn't look right to me... The address space map is going to be small and immutable, so maybe you could even pass it by value to prepareTypeConverter(...)?
  • Would it make sense to just use an empty map in case no triple is present? Or asked differently, would we expect offload AS to be used for the default target?
  • In the gist you mention the triple is not available for .cir test-cases -- could you (or @sitio-couto) address that in a separate PR first? I'd expect that we only need to specify that for OpenCL-related test-cases, assuming there's a well-defined default in case no triple is given.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, perhaps I've been too insistent on using an ideal method to call TargetLoweringInfo. We could opt for an interim solution (passing AS map by value) and refine it as it becomes more accessible.

A feasible approach would be:

// All CIR AS maps have static lifetime, it's safe to ref it
auto getCIRASMap = [&]() -> mlir::cir::AddressSpaceAttr::map_t const & {
  // If the triple is not present, e.g. CIR modules parsed from text, we
  // cannot init LowerModule properly.
  assert(!::cir::MissingFeatures::makeTripleAlwaysPresent());
  // Here we will use a default AS map to ignore all AS stuff.
  static const mlir::cir::AddressSpaceAttr::map_t defaultASMap{0};
  if (module->hasAttr("cir.triple")) {
    mlir::PatternRewriter rewriter{&getContext()};
    auto lowerModule = mlir::cir::createLowerModule(module, rewriter);
    return lowerModule.getTargetLoweringInfo().getCIRAddrSpaceMap();
  } else {
    return defaultASMap;
  }
};
prepareTypeConverter(converter, dataLayout, getCIRASMap());


// FIXME: No actual usage of this rewriter, just to make the ctor happy.
mlir::PatternRewriter rewriter{&getContext()};
LowerModule lowerModule = createLowerModule(module, rewriter);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's trivial enough why we need a specific pass?

@@ -0,0 +1,72 @@
//===- AArch64.cpp --------------------------------------------------------===//
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filename is missing one letter and header is wrong.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! For the filename, both clang libBasic and libCodeGen use SPIR.cpp, because there are spir and spirv target in clang. Usually spirv is based on spir with some overrides, e.g. class SPIRVTargetCodeGenInfo : public CommonSPIRTargetCodeGenInfo. Here I didn't include spir because it's not used yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha, thanks for the clarification

@bcardosolopes
Copy link
Member

Let me try to expand a bit on what I meant by over design, and ask a few more clarifying questions, since I might have misunderstood part of the approach.

This PR abstracts target specific address map into more generic, unified approach. It does that by keeping at the CIR level a more generic representation for address spaces, example: addrspace(offload_local) instead of a target specific number.

CIR is then lowered as part of target specific ABI lowering to a CIR representation that has a target specific address space information.
Q: Is this right?
Q: I don't see any tests checking what become of those when the ABI lowering is done, do they become addrspace(target<N>) at that point?
Q: What happens when the __attribute__... is speficied with an arbitrary N number, let's say 777?, is that still represented by addrspace(target<N>)?

I find that ABI lowering for address space a bit premature. If I'm writing a pass on top of CIR, now I need to know whether that CIR is pre or post target ABI lowering to try understand what's version of address spaces I should be looking at...
Q: In case other __attribute__s are used within the same code, with arbitrary numbers, how is one supposed to work with the mixed things?
Q: What kind of compiler transformations are you imagining that would require this kind of distiction this early? When users of OpenCL want to write a transformation, do they care whether they are unified or already target specific?

If you have to use a AttrTypeSubElementHandler to replace every pointer inside a struct, this isn't a viable approach IMO. It's adding extra complexity for something that has no use or no intend use - or please specify what you have in mind here for for future transformations and analysis. A better place to do that instead would be during LLVM lowering, such that we keep the unified info all the in CIR and convert it to target specific whenever we have to convert pointers anwyays. ABI info should be designed to allow queries from within LowerToLLVM, not sure if it's possible to do that now, but I brought up this point to @sitio-couto previously.

In a gist, I'd like to keep this out of target lowering, I really not convinced by any trade offs.

GSoC mentor feedback hat here: in the future, please elaborate on workflow and design decisions on the PR description, preferably with examples. Also, before going all the way into design (which I think it's pretty cool and good work overall), consider first introducing the simple and direct approach, once that lands, you can incrementally think about how to redesign on top of existing, reliable testcases. Same is true about use cases, in CIR we try not to raise representations without a concrete use case in mind, that could be a guiding approach to consider. As an example: the RFC was very nice, but probably an improvement that could have come up incrementally, once we have fledged how the pieces connect in practice for more than opencl address spaces.

@seven-mile
Copy link
Collaborator Author

CIR is then lowered as part of target specific ABI lowering to a CIR representation that has a target specific address space information.
Q: Is this right?

No. We planned to only touch the CIR AS while or just before LowerToLLVM. Wherever we do the lowering, in the TypeConverter or a separate pass, it just intend to do the lowering directly or prepare for the lowering.

Let me describe the original motive of a separate pass. When I first try introducing target-specific info in the type converter of LLVM lowering, I found that it have to mess up the ConvertCIRToLLVMPass with strange synchronization and an unused rewriter state. I think that's not acceptable. So a direct thought on that would be to use a separate pass to extract and isolate these details. Note that I also added the new pass in populateCIRPreLoweringPasses.

But when a separate pass leads to the misfunctional replacing in struct type, I didn't do a rethink on the approach, just thinking "it's a bug so let's fix it anyway". Now I realized that the ground is not solid enough. I think we can just go for a solution in TypeConverter.

Q: I don't see any tests checking what become of those when the ABI lowering is done, do they become addrspace(target<N>) at that point?

Same no and same reason. Sorry for the confusion.

Q: What happens when the __attribute__... is speficied with an arbitrary N number, let's say 777?, is that still represented by addrspace(target<N>)?

Yes.

I find that ABI lowering for address space a bit premature. If I'm writing a pass on top of CIR, now I need to know whether that CIR is pre or post target ABI lowering to try understand what's version of address spaces I should be looking at...
Q: In case other __attribute__s are used within the same code, with arbitrary numbers, how is one supposed to work with the mixed things?
Q: What kind of compiler transformations are you imagining that would require this kind of distiction this early? When users of OpenCL want to write a transformation, do they care whether they are unified or already target specific?

True. It's totally a different thing from CallConvLowering. I think the unified one always carries more information than the target-specific one, otherwise we wouldn't be able to do the lowering. In other words, it shouldn't provide any extra optimization opportunities if we do it earlier than LowerToLLVM. I think we are in agreement on this topic.

ABI info should be designed to allow queries from within LowerToLLVM, not sure if it's possible to do that now, but I brought up this point to @sitio-couto previously.

Sure. TBH, introducing target-specific information in TypeConverter is not a common use case. I don't expect it to be done.😂

Currently @sitio-couto suggested me that a general method here is to keep an instance of LowerModule for the pass requiring TargetLoweringInfo just like what CallConvLowering does. It's an interim solution and waiting for future refactor. The @jopperm's proposed approach that keeps the reference to the AS map rather than the LowerModule looks more graceful to me (at least no strange lazy initialization logic). We will also need to update it to a more fluent way when one is available. Wdyt?


Many thanks for the feedback and suggestions! I often have a feel that I lack the ability to describe my obscure design choices in a brief and clear manner: I'm always replaying what I did and presenting verbose details, or just providing too less information by a too short description. Understanding this fact would be a good start, and trying providing more examples should helps greatly. I'll try my best to do better next time ; )

@seven-mile
Copy link
Collaborator Author

seven-mile commented Jul 12, 2024

Made some changes:

  • Use the type converter and remove the codes around AttrTypeSubElementHandler.
  • Rename map_t to MapTy to match the naming convention of MLIR.
  • To pass the AS map into the type converter, just like DataLayout, I use a pointer with null semantic corresponding to "absence of cir.triple". When we are lowering non-null AddressSpaceAttr, we assert that the AS map can not be null. It avoids implicit miscomilations.
  • Extended existing test cases: IR/address-space.cir & Lowering/address-space.cir.

Sorry. Ignore the close and re-open, mistake.🥲


Regarding breaking this PR into dependent pieces, I'm planning:

  • [NFC] Refactor createLowerModule to be used by libLowerToLLVM
  • [NFC] enable TargetLoweringInfo for AppleArm64
  • (This PR) The core offload_* supporting codes with SPIR TargetInfo

Will rebase this PR after that, sorry for the review churn.

@seven-mile seven-mile closed this Jul 12, 2024
@seven-mile seven-mile reopened this Jul 12, 2024
@seven-mile seven-mile marked this pull request as draft July 12, 2024 13:12
@bcardosolopes
Copy link
Member

Will rebase this PR after that, sorry for the review churn.

Don't worry about that, thanks for breaking this up!

@bcardosolopes
Copy link
Member

bcardosolopes commented Jul 12, 2024

Further PR splitting you could do here:

  1. Add the CIR dialect bits for unified address spaces + clang/test/CIR/IR/address-space.cir.
  2. Add SPIR.cpp skeleton without the address space maps.

This will basically leave the contentios part of this PR isolated and easier to iterate.

@seven-mile seven-mile changed the title [CIR][Dialect] Add offload_* cases in address space attribute [CIR][CodeGen][LowerToLLVM] End-to-end implementation of offload_* cases Jul 13, 2024
@seven-mile seven-mile changed the title [CIR][CodeGen][LowerToLLVM] End-to-end implementation of offload_* cases [CIR][CodeGen][LowerToLLVM] End-to-end implementation of offload_* cases for OpenCL with SPIR-V target Jul 17, 2024
@seven-mile seven-mile marked this pull request as ready for review July 18, 2024 01:37
@seven-mile
Copy link
Collaborator Author

seven-mile commented Jul 18, 2024

Changes and PR description updated. This PR is ready for another round of review. ; )

Comment on lines 3598 to 3599
if (!module->hasAttr("cir.triple"))
return nullptr;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that IRASMap const * is a pointer to array. It would be nullptr if triple is not present.

If no address space attributes are present in CIR, we ensure that a null IRASMap remains functional. However, if there is an address space attribute to convert, the guard at line 3619 will reject it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought a bit about using the pointer here, but couldn't come up with a better design. I believe using std::optional<std::reference_wrapper<IRASMap>> would work, but that doesn't really simplify anything.

The underlying question is, do we really need this miscompilation protection here or could we just assume a default mapping similar to always having a DataLayout? In case a target triple is present and the target doesn't supply an AS map (or an incomplete AS map), we'd happily use a default flattening to (target) AS 0. To catch that, we'd need to use an actual map (e.g. SmallDenseMap) to let targets opt-in to the CIR AS they support, but that design point would deviate from OG clang.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, for this PR I'm suggesting to either (1) keep the "no triple -> null map" logic or (2) drop the check and use a default map when no triple is present. I meant (3) "targets opt-in to supported AS" more as food for thought and potential future improvement that should not hold up this PR which finally reactivates the in-tree OpenCL tests 🎉

Copy link
Contributor

@jopperm jopperm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat after all the prep PRs landed 👍

@@ -707,6 +707,7 @@ def AddressSpaceAttr : CIR_Attr<"AddressSpace", "addrspace"> {
let extraClassDeclaration = [{
static constexpr char kTargetKeyword[] = "}]#targetASCase.symbol#[{";
static constexpr int32_t kFirstTargetASValue = }]#targetASCase.value#[{;
using MapTy = unsigned[kFirstTargetASValue];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
using MapTy = unsigned[kFirstTargetASValue];
using MapTy = std::array<unsigned, kFirstTargetASValue>;

(more idiomatic in C++, and might prevent future headaches if we need to copy the maps etc.)

@@ -3583,24 +3585,44 @@ void populateCIRToLLVMConversionPatterns(mlir::RewritePatternSet &patterns,
}

namespace {

using IRASMap = mlir::cir::AddressSpaceAttr::MapTy;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the short name be IRASMapTy?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think MeowTy is a common pattern for member types in class. For free type names used in variable/parameter declarations, I prefer a clean CamelCase without suffix, because CamelCase is already the feature of types, no?

Btw, IRASMap copies its name from clang::LangASMap XD

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer a clean CamelCase without suffix

We should use the pattern that's used overall in LLVM despite our personal preferences, otherwise someone will need to fix this when we upstream this chunk of code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I mean the same, by "because CamelCase is already the feature of types, no?"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IRASMap -> IRASMapTy

@@ -8,5 +8,10 @@ TargetLoweringInfo::TargetLoweringInfo(std::unique_ptr<ABIInfo> Info)

TargetLoweringInfo::~TargetLoweringInfo() = default;

AddressSpaceAttr::MapTy const &TargetLoweringInfo::getCIRAddrSpaceMap() const {
static AddressSpaceAttr::MapTy defaultAddrSpaceMap = {0};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Add a comment saying that the default target behavior is to flatten all address spaces to AS 0.
  • The = {0} is not strictly necessary here, correct? I find it a bit misleading, and the zero-initialization is better explained as a comment I think.

Copy link
Member

@bcardosolopes bcardosolopes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another round of review, thanks for breaking it, this is much simpler and easier to review.

Have you considered adding a AddressMapAttr (wrapping an ArrayAttr perhaps`)? Seems like it would make verifying the length easy, easy to deal with the default value because we can encode in tablegen, and easy for targets to populate and return. Given how often those will be built in a compilation instance, value semantics is probably easier to follow here.

@@ -8,5 +8,11 @@ TargetLoweringInfo::TargetLoweringInfo(std::unique_ptr<ABIInfo> Info)

TargetLoweringInfo::~TargetLoweringInfo() = default;

AddressSpaceAttr::MapTy const &TargetLoweringInfo::getCIRAddrSpaceMap() const {
// Flatten all non-target address spaces to addrspace(0)
static AddressSpaceAttr::MapTy defaultAddrSpaceMap = {};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to be static? We usually reserve using local static variables only when it's strictly needed, and looks like this function could just be a return {}? Note that this or the target versions are not called much more than a couple times in a pipeline, so probably not worth any early optimization.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can replace the reference with just value.

1, // offload_global
2, // offload_constant
4, // offload_generic
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if a target define an address space map that has fewer, more or none elements? Why this need to be a static global? Seems like a constant global would be enough.

Copy link
Collaborator Author

@seven-mile seven-mile Jul 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if a target define an address space map that has fewer, more or none elements?

The target author should use any dummy value they want, and ensure by their own that no unexpected AS would be emitted for normal CodeGen.

In other words, all maps for different targets should encode all cases that is always sync with AddressSpaceAttr.

This is the approach used by clang::LangASMap. I think it's reasonable because

  • It does not make sense to add a target that overrides getCIRAddrSpaceMap without considering all (language or unified) addrspace cases.
  • It does not make sense to add a (language or unified) addrspace case without considering all maps in all targets.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words, all maps for different targets should encode all cases that is always sync with AddressSpaceAttr.

My question is a more practical one: if by mistake the array is incomplete, do we get a compiler error or we gonna silently miscompile? A lookup tables are cool, but if you only use it once it might not be worth the potential mistakes other could make.

@@ -3583,24 +3585,44 @@ void populateCIRToLLVMConversionPatterns(mlir::RewritePatternSet &patterns,
}

namespace {

using IRASMap = mlir::cir::AddressSpaceAttr::MapTy;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer a clean CamelCase without suffix

We should use the pattern that's used overall in LLVM despite our personal preferences, otherwise someone will need to fix this when we upstream this chunk of code.

Copy link
Collaborator Author

@seven-mile seven-mile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the attribute approach, I think it's a bit weird for the value returned by TargetLoweringInfo:

  • The ideal pattern for CIR addrspace map in the future should be as simple as lowerModule.getTargetLoweringInfo().getCIRAddrSpaceMap()[cirAS], align to other target-specific information. We need an extra wrapper getIRAddrSpaceMap for now only because TargetLoweringInfo is premature temporarily.
  • MLIR objects require a context to live in, which would complicate the construction.
  • Attribute that never exists in IR is probably abusing the infrastructure.
  • MLIR attributes are usually harder to work with: the dynamic element type and variable length of ArrayAttr. We do not even need a verification logic if the map has the type std::array<unsigned, kFirstTargetASValue>.

I don't see explicit drawbacks of "no triple -> null map". Note that the approach is relatively self-contained. getIRAddrSpaceMap resides in an anonymous namespace as the implementation details.

So if we don't expect a reference to static storage, we can pass it by value overall. To be more specific, use the following combination:

  • MapTy = std::array<unsigned, kFirstTargetASValue>: leaves the addrspace map as is
  • AddressSpaceAttr::MapTy TargetLoweringInfo::getCIRAddrSpaceMap(): copies the std::array and pass it around
  • std::optional<AddressSpaceAttr::MapTy> getIRAddrSpaceMap(mlir::ModuleOp module): provides the extra semantic for "no triple -> null map"

1, // offload_global
2, // offload_constant
4, // offload_generic
};
Copy link
Collaborator Author

@seven-mile seven-mile Jul 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if a target define an address space map that has fewer, more or none elements?

The target author should use any dummy value they want, and ensure by their own that no unexpected AS would be emitted for normal CodeGen.

In other words, all maps for different targets should encode all cases that is always sync with AddressSpaceAttr.

This is the approach used by clang::LangASMap. I think it's reasonable because

  • It does not make sense to add a target that overrides getCIRAddrSpaceMap without considering all (language or unified) addrspace cases.
  • It does not make sense to add a (language or unified) addrspace case without considering all maps in all targets.

@@ -3583,24 +3585,44 @@ void populateCIRToLLVMConversionPatterns(mlir::RewritePatternSet &patterns,
}

namespace {

using IRASMap = mlir::cir::AddressSpaceAttr::MapTy;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I mean the same, by "because CamelCase is already the feature of types, no?"

@@ -8,5 +8,11 @@ TargetLoweringInfo::TargetLoweringInfo(std::unique_ptr<ABIInfo> Info)

TargetLoweringInfo::~TargetLoweringInfo() = default;

AddressSpaceAttr::MapTy const &TargetLoweringInfo::getCIRAddrSpaceMap() const {
// Flatten all non-target address spaces to addrspace(0)
static AddressSpaceAttr::MapTy defaultAddrSpaceMap = {};
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can replace the reference with just value.

Copy link
Member

@bcardosolopes bcardosolopes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this approach is still a too much for what it needs to do. I'm not confortable with the table approach (for reasons already mentioned and didn't get satifying answers) and also creating the lowerModule here is a lot for something that could be a switch. Ideally, in a separate PR, one could go and pass in the lowerModule if possible and only create it if none is available, that might mitigate the cost a bit (e.g. if calling from cir-opt then you create it, if from clang, it's passed down to the LLVM lowering pass).

Additionally, thinking more about this, I don't really see any special reason for this to be more part of TargetLoweringInfo than ABIInfo. That said, I suggest you move SPIRVABIInfo into a header, add a method that translate a unified address space into your target thing. This method should (a) contain a switch that goes over a unified enum and returns a target specific int (b) be unrecheable in the base class.

1, // offload_global
2, // offload_constant
4, // offload_generic
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words, all maps for different targets should encode all cases that is always sync with AddressSpaceAttr.

My question is a more practical one: if by mistake the array is incomplete, do we get a compiler error or we gonna silently miscompile? A lookup tables are cool, but if you only use it once it might not be worth the potential mistakes other could make.

@@ -3583,24 +3585,44 @@ void populateCIRToLLVMConversionPatterns(mlir::RewritePatternSet &patterns,
}

namespace {

using IRASMap = mlir::cir::AddressSpaceAttr::MapTy;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IRASMap -> IRASMapTy

@seven-mile
Copy link
Collaborator Author

... add a method that translate a unified address space into your target thing. This method should (a) contain a switch that goes over a unified enum and returns a target specific int (b) be unrecheable in the base class.

A faultless switch is very nice to have. Now I only rely on #752 to provide the optional semantic for LowerModule so that I can handle the test cases without target info.

Ideally, in a separate PR, one could go and pass in the lowerModule if possible and only create it if none is available, that might mitigate the cost a bit (e.g. if calling from cir-opt then you create it, if from clang, it's passed down to the LLVM lowering pass).

I'm not sure I understand the "cost" here. But, sure, I believe there will be only one instance in the whole pipeline for LowerModule, LowerContext, LowerFunction ... in the future. Given that Vinicius is actively working on it, I don't want to interfere his plan.

I don't really see any special reason for this to be more part of TargetLoweringInfo than ABIInfo.

I don't think it make many differences regarding functionality. I prefer keeping the method getTargetAddrSpaceFromCIRAddrSpace attached to TargetLoweringInfo because

  • The description of clang::CodeGen::ABIInfo

ABIInfo - Target specific hooks for defining how a type should be passed or returned from functions.

  • In OG CodeGen, there's a method related to address space: getASTAllocaAddressSpace. It's designed to be in TargetCodeGenInfo, rather than in ABIInfo.

@seven-mile
Copy link
Collaborator Author

Rebased. The description of PR also updated.

Copy link
Member

@bcardosolopes bcardosolopes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, almost there!

Two remaining pieces I'd like to see here before accepting this. I'm open to merge this PR right away if I can get your word on tackling them in two next separate PRs:

  1. Now that LowerModule is a unique ptr, pass it down to LowerToLLVM. If coming from cir-opt, you then instantiate a new one.
  2. Ignoring the triple / requiring both CIR and LLVM makes me very uncomfortable, the receipt for technical debt, we better get this right out of the bat. All CIR tests that needed it, should have the CIR triple. LLVM triple should be created from the CIR one, it's unnecessary to have both in the same file.

Seems like you need this before @sitio-couto does, so probably the right opportunity to implement it.

@seven-mile
Copy link
Collaborator Author

seven-mile commented Jul 26, 2024

I'm open to merge this PR right away if I can get your word on tackling them in two next separate PRs.

Sure. I would be glad to improve these two aspects right in the next PRs. But the final target triple / data layout may need further considerations:

I think Vinicius want to use a similar approach to FIR because the setMLIRDataLayout function is copied from FIR ; ). So first let's take a rough look at what FIR does for target triple and target data layout.

For both triple and data layout, FIR uses the module attribute name in LLVM. It's consistent so I think it's fine. We have inconsistency in CIR though.

For triple, there are test cases invoking

  • fir-opt --target-rewrite="target=x86_64-unknown-linux-gnu"
    • Yes, TargetRewrite is a pass to constitute the pass pipeline here of fir-opt. This pass and FIRToLLVMLowering pass are both parameterized by target and datalayout stuff. The default value of target is empty string, which means default triple of LLVM: the triple of toolchain.
  • and tco --target=aarch64-unknown-linux-gnu.
    • tco is used for LLVM Lowering therefore similar to cir-translate, but they usually tag it as "for internal test only". This option set (or rewrite) the target attribute in module manually. The default value of target is native.

With the two option serve for FIR-to-FIR and FIR-to-LLVM pipeline. FIR doesn't need any triple set in the IR of test cases. It set the triple in command line like compiler drivers.

For data layout, FIR is taking empty string as the default value. I don't think it's a good design.

Generally speaking, I'm considering the following approach:

  • Let cir-translate be able to somehow consume the triple from command line. If none, set it to native. mlir-translate also uses llvm::cl, Hopefully it would work flawlessly with a llvm::cl::opt extended.
  • Get clang::TargetInfo from the triple, then fetch the data layout string defined by the target.

@bcardosolopes
Copy link
Member

@seven-mile thanks for the detailed comparison with FIR, your plan sounds solid, thanks for looking into this!

Copy link
Member

@bcardosolopes bcardosolopes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bcardosolopes bcardosolopes merged commit 9445acc into llvm:main Jul 26, 2024
6 checks passed
Hugobros3 pushed a commit to shady-gang/clangir that referenced this pull request Oct 2, 2024
…cases for OpenCL with SPIR-V target (llvm#724)

This PR implements `offload_*` cases discussed in [this
thread](https://discourse.llvm.org/t/rfc-clangir-unified-address-space-design-in-clangir/79728).

* Integrate target-specific CIR-to-LLVM address space map into
`TargetLoweringInfo`
* CIRGen: Implement these cases in `getValueFromLangAS`
* Lowering: Extend the state of type converter with `LowerModule`

When frontend provides a new LangAS like `opencl_generic`, it would be
processed by CIRGenTypes and `Builder.getPointerTo()` and encoded as
`offload_generic`.

When we lower CIR to LLVM,

* For pointer types without address space attribute, it's mapped to `ptr
addrspace(0)` directly
* For target cases `target<x>`, it's mapped to `ptr addrspace(x)`
* For other defined cases, query the target info with a new virtual
method `getTargetAddrSpaceFromCIRAddrSpace`. General targets like X86
and ARM64 map all known cases to 0. For SPIR-V target here, it maps
`offload_generic` to `addrspace(4)`.
smeenai pushed a commit to smeenai/clangir that referenced this pull request Oct 9, 2024
…cases for OpenCL with SPIR-V target (llvm#724)

This PR implements `offload_*` cases discussed in [this
thread](https://discourse.llvm.org/t/rfc-clangir-unified-address-space-design-in-clangir/79728).

* Integrate target-specific CIR-to-LLVM address space map into
`TargetLoweringInfo`
* CIRGen: Implement these cases in `getValueFromLangAS`
* Lowering: Extend the state of type converter with `LowerModule`

When frontend provides a new LangAS like `opencl_generic`, it would be
processed by CIRGenTypes and `Builder.getPointerTo()` and encoded as
`offload_generic`.

When we lower CIR to LLVM,

* For pointer types without address space attribute, it's mapped to `ptr
addrspace(0)` directly
* For target cases `target<x>`, it's mapped to `ptr addrspace(x)`
* For other defined cases, query the target info with a new virtual
method `getTargetAddrSpaceFromCIRAddrSpace`. General targets like X86
and ARM64 map all known cases to 0. For SPIR-V target here, it maps
`offload_generic` to `addrspace(4)`.
smeenai pushed a commit to smeenai/clangir that referenced this pull request Oct 9, 2024
…cases for OpenCL with SPIR-V target (llvm#724)

This PR implements `offload_*` cases discussed in [this
thread](https://discourse.llvm.org/t/rfc-clangir-unified-address-space-design-in-clangir/79728).

* Integrate target-specific CIR-to-LLVM address space map into
`TargetLoweringInfo`
* CIRGen: Implement these cases in `getValueFromLangAS`
* Lowering: Extend the state of type converter with `LowerModule`

When frontend provides a new LangAS like `opencl_generic`, it would be
processed by CIRGenTypes and `Builder.getPointerTo()` and encoded as
`offload_generic`.

When we lower CIR to LLVM,

* For pointer types without address space attribute, it's mapped to `ptr
addrspace(0)` directly
* For target cases `target<x>`, it's mapped to `ptr addrspace(x)`
* For other defined cases, query the target info with a new virtual
method `getTargetAddrSpaceFromCIRAddrSpace`. General targets like X86
and ARM64 map all known cases to 0. For SPIR-V target here, it maps
`offload_generic` to `addrspace(4)`.
keryell pushed a commit to keryell/clangir that referenced this pull request Oct 19, 2024
…cases for OpenCL with SPIR-V target (llvm#724)

This PR implements `offload_*` cases discussed in [this
thread](https://discourse.llvm.org/t/rfc-clangir-unified-address-space-design-in-clangir/79728).

* Integrate target-specific CIR-to-LLVM address space map into
`TargetLoweringInfo`
* CIRGen: Implement these cases in `getValueFromLangAS`
* Lowering: Extend the state of type converter with `LowerModule`

When frontend provides a new LangAS like `opencl_generic`, it would be
processed by CIRGenTypes and `Builder.getPointerTo()` and encoded as
`offload_generic`.

When we lower CIR to LLVM,

* For pointer types without address space attribute, it's mapped to `ptr
addrspace(0)` directly
* For target cases `target<x>`, it's mapped to `ptr addrspace(x)`
* For other defined cases, query the target info with a new virtual
method `getTargetAddrSpaceFromCIRAddrSpace`. General targets like X86
and ARM64 map all known cases to 0. For SPIR-V target here, it maps
`offload_generic` to `addrspace(4)`.
lanza pushed a commit that referenced this pull request Nov 5, 2024
…cases for OpenCL with SPIR-V target (#724)

This PR implements `offload_*` cases discussed in [this
thread](https://discourse.llvm.org/t/rfc-clangir-unified-address-space-design-in-clangir/79728).

* Integrate target-specific CIR-to-LLVM address space map into
`TargetLoweringInfo`
* CIRGen: Implement these cases in `getValueFromLangAS`
* Lowering: Extend the state of type converter with `LowerModule`

When frontend provides a new LangAS like `opencl_generic`, it would be
processed by CIRGenTypes and `Builder.getPointerTo()` and encoded as
`offload_generic`.

When we lower CIR to LLVM,

* For pointer types without address space attribute, it's mapped to `ptr
addrspace(0)` directly
* For target cases `target<x>`, it's mapped to `ptr addrspace(x)`
* For other defined cases, query the target info with a new virtual
method `getTargetAddrSpaceFromCIRAddrSpace`. General targets like X86
and ARM64 map all known cases to 0. For SPIR-V target here, it maps
`offload_generic` to `addrspace(4)`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants