[CIR][CodeGen][LowerToLLVM] End-to-end implementation of `offload_*` cases for OpenCL with SPIR-V target #724

seven-mile · 2024-07-07T18:40:36Z

This PR implements offload_* cases discussed in this thread.

Integrate target-specific CIR-to-LLVM address space map into TargetLoweringInfo
CIRGen: Implement these cases in getValueFromLangAS
Lowering: Extend the state of type converter with LowerModule

When frontend provides a new LangAS like opencl_generic, it would be processed by CIRGenTypes and Builder.getPointerTo() and encoded as offload_generic.

When we lower CIR to LLVM,

For pointer types without address space attribute, it's mapped to ptr addrspace(0) directly
For target cases target<x>, it's mapped to ptr addrspace(x)
For other defined cases, query the target info with a new virtual method getTargetAddrSpaceFromCIRAddrSpace. General targets like X86 and ARM64 map all known cases to 0. For SPIR-V target here, it maps offload_generic to addrspace(4).

seven-mile

@sitio-couto Several ABI-related refactor and new code paths. I'll appreciate it if you can take a look!

seven-mile · 2024-07-07T18:48:11Z

clang/lib/CIR/Dialect/Transforms/TargetLowering/ItaniumCXXABI.cpp

@@ -59,6 +59,15 @@ bool ItaniumCXXABI::classifyReturnType(LowerFunctionInfo &FI) const {
  return false;
 }

+class AppleARM64CXXABI : public ItaniumCXXABI {


AppleARM64CXXABI is used because Nathan introduced macOS host in the test driver.c. When we create LowerModule, the CXXABI is created meanwhile, leading to NYI. It's simple so I just add it here for completeness.

seven-mile · 2024-07-07T18:53:54Z

clang/lib/CIR/Dialect/Transforms/AddrSpaceLowering.cpp

+
+  // FIXME: No actual usage of this rewriter, just to make the ctor happy.
+  mlir::PatternRewriter rewriter{&getContext()};
+  LowerModule lowerModule = createLowerModule(module, rewriter);


All LowerModule stuff is unpolished. @sitio-couto and I reached an agreement to defer the unification of the states like LowerModule. All the info needed here is the CIR AS map in TargetLoweringInfo, which is trivial enough. It's just for future convenience to define rewriteAddrSpace method in LowerModule.

If it's trivial enough why we need a specific pass?

bcardosolopes

This PR is doing different things and moving code while at it.

I'd like to see this in more incremental and well explained pieces (e.g. SPIRV skeleton can be added separately, you also don't provide any description about this address map stuff, a sudden change to struct type which indirectly increases memory size of every single struct allocation, etc).

The approach also feels a bit over engineered, see my questions inline. If you have refactoring to do, please do them separately.

bcardosolopes · 2024-07-09T02:41:27Z

clang/include/clang/CIR/Dialect/IR/CIRAttrs.td

-    address space, which means it's not yet converted by the address space map
-    to carry target-specific semantics.
+    The address space attribute is used in pointer types. It essentially models
+    `clang::LangAS` rather than the LLVM address space.


Perhaps change It essentially models clang::LangAS rather than the LLVM address space to It essentially provides an unified models on top of clang::LangAS, rather than LLVM address spaces

bcardosolopes · 2024-07-09T02:45:14Z

clang/include/clang/CIR/Dialect/IR/CIRTypesDetails.h

@@ -35,11 +35,14 @@ struct StructTypeStorage : public TypeStorage {
    bool packed;
    StructType::RecordKind kind;
    ASTRecordDeclInterface ast;
+    // Extra unique key to differentiate replacements on mutable parts
+    int replaceVersion;


This struct change sounds super adhoc and specific, doesn't fit the overall picture. Please elaborate why we need this and what are you trying to do, so perhaps we can find a different solution?

Sure. The whole AttrTypeSubElementHandler was taken from LLVMStructType, which is necessary for pure C++ types as such to support walk and replace infrastructure in MLIR -- It's necessary for the AS lowering pass: recursivelyReplaceElementsIn to work.

clangir/mlir/lib/Dialect/LLVMIR/IR/TypeDetail.h

Lines 333 to 355 in 80e1a10

/// Allow walking and replacing the subelements of a LLVMStructTypeStorage key.

template <>

struct AttrTypeSubElementHandler<LLVM::detail::LLVMStructTypeStorage::Key> {

static void walk(const LLVM::detail::LLVMStructTypeStorage::Key &param,

AttrTypeImmediateSubElementWalker &walker) {

if (param.isIdentified())

walker.walkRange(param.getIdentifiedStructBody());

else

walker.walkRange(param.getTypeList());

}

static FailureOr<LLVM::detail::LLVMStructTypeStorage::Key>

replace(const LLVM::detail::LLVMStructTypeStorage::Key &param,

AttrSubElementReplacements &attrRepls,

TypeSubElementReplacements &typeRepls) {

// TODO: It's not clear how we support replacing sub-elements of mutable

// types.

if (param.isIdentified())

return failure();

return LLVM::detail::LLVMStructTypeStorage::Key(

typeRepls.take_front(param.getTypeList().size()), param.isPacked());

}

};

But they left a TODO comment for the mutable case. I thought an extra tracing key would work, so gave it a try.

bcardosolopes · 2024-07-09T02:46:11Z

clang/include/clang/CIR/Dialect/Passes.h

@@ -41,6 +41,8 @@ std::unique_ptr<Pass> createGotoSolverPass();
 /// Create a pass to lower ABI-independent function definitions/calls.
 std::unique_ptr<Pass> createCallConvLoweringPass();

+std::unique_ptr<Pass> createAddrSpaceLoweringPass();


Why we need a separate pass to do address space lowering, I don't see any good reason why it shouldn't be done directly in CIRGen.

I don't think it can be done in CIRGen immediately. Did you mean the TypeConverter of DirectToLLVM? I also prefer that way and already implemented it previously. If we use TypeConverter instead, we can also get rid of recursivelyReplaceElementsIn and replaceVersion (the handler should still be fixed though, can be another patch). It just bother a bit ConvertCIRToLLVMPass::runOnOperation to maintain the states required by LowerModule, and a lazy initialization logic for LowerModule as we do not always have triple present. If it's okay, let me update the changes.

It just bother a bit ConvertCIRToLLVMPass::runOnOperation to maintain the states required by LowerModule

Can you elaborate a bit what would be needed specifically? Would it require a hacky solution or just some extra work?

Sure. This is the patch looks like:
https://gist.github.com/seven-mile/c57ddfb96e180af1609d429e1e66f67b

Apart from messing up ConvertCIRToLLVMPass::runOnOperation, it doesn't look too hacky IMO.

Thanks! Thoughts:

The lazy initialization of the lowering state requiring synchronization doesn't look right to me... The address space map is going to be small and immutable, so maybe you could even pass it by value to prepareTypeConverter(...)?

Would it make sense to just use an empty map in case no triple is present? Or asked differently, would we expect offload AS to be used for the default target?

In the gist you mention the triple is not available for .cir test-cases -- could you (or @sitio-couto) address that in a separate PR first? I'd expect that we only need to specify that for OpenCL-related test-cases, assuming there's a well-defined default in case no triple is given.

Yes, perhaps I've been too insistent on using an ideal method to call TargetLoweringInfo. We could opt for an interim solution (passing AS map by value) and refine it as it becomes more accessible.

A feasible approach would be:

// All CIR AS maps have static lifetime, it's safe to ref it auto getCIRASMap = [&]() -> mlir::cir::AddressSpaceAttr::map_t const & { // If the triple is not present, e.g. CIR modules parsed from text, we // cannot init LowerModule properly. assert(!::cir::MissingFeatures::makeTripleAlwaysPresent()); // Here we will use a default AS map to ignore all AS stuff. static const mlir::cir::AddressSpaceAttr::map_t defaultASMap{0}; if (module->hasAttr("cir.triple")) { mlir::PatternRewriter rewriter{&getContext()}; auto lowerModule = mlir::cir::createLowerModule(module, rewriter); return lowerModule.getTargetLoweringInfo().getCIRAddrSpaceMap(); } else { return defaultASMap; } }; prepareTypeConverter(converter, dataLayout, getCIRASMap());

bcardosolopes · 2024-07-09T02:47:55Z

clang/lib/CIR/Dialect/Transforms/AddrSpaceLowering.cpp

+
+  // FIXME: No actual usage of this rewriter, just to make the ctor happy.
+  mlir::PatternRewriter rewriter{&getContext()};
+  LowerModule lowerModule = createLowerModule(module, rewriter);


If it's trivial enough why we need a specific pass?

bcardosolopes · 2024-07-09T02:52:58Z

clang/lib/CIR/Dialect/Transforms/TargetLowering/Targets/SPIR.cpp

@@ -0,0 +1,72 @@
+//===- AArch64.cpp --------------------------------------------------------===//


Filename is missing one letter and header is wrong.

Nice catch! For the filename, both clang libBasic and libCodeGen use SPIR.cpp, because there are spir and spirv target in clang. Usually spirv is based on spir with some overrides, e.g. class SPIRVTargetCodeGenInfo : public CommonSPIRTargetCodeGenInfo. Here I didn't include spir because it's not used yet.

gotcha, thanks for the clarification

bcardosolopes · 2024-07-10T20:27:03Z

Let me try to expand a bit on what I meant by over design, and ask a few more clarifying questions, since I might have misunderstood part of the approach.

This PR abstracts target specific address map into more generic, unified approach. It does that by keeping at the CIR level a more generic representation for address spaces, example: addrspace(offload_local) instead of a target specific number.

CIR is then lowered as part of target specific ABI lowering to a CIR representation that has a target specific address space information.
Q: Is this right?
Q: I don't see any tests checking what become of those when the ABI lowering is done, do they become addrspace(target<N>) at that point?
Q: What happens when the __attribute__... is speficied with an arbitrary N number, let's say 777?, is that still represented by addrspace(target<N>)?

I find that ABI lowering for address space a bit premature. If I'm writing a pass on top of CIR, now I need to know whether that CIR is pre or post target ABI lowering to try understand what's version of address spaces I should be looking at...
Q: In case other __attribute__s are used within the same code, with arbitrary numbers, how is one supposed to work with the mixed things?
Q: What kind of compiler transformations are you imagining that would require this kind of distiction this early? When users of OpenCL want to write a transformation, do they care whether they are unified or already target specific?

If you have to use a AttrTypeSubElementHandler to replace every pointer inside a struct, this isn't a viable approach IMO. It's adding extra complexity for something that has no use or no intend use - or please specify what you have in mind here for for future transformations and analysis. A better place to do that instead would be during LLVM lowering, such that we keep the unified info all the in CIR and convert it to target specific whenever we have to convert pointers anwyays. ABI info should be designed to allow queries from within LowerToLLVM, not sure if it's possible to do that now, but I brought up this point to @sitio-couto previously.

In a gist, I'd like to keep this out of target lowering, I really not convinced by any trade offs.

GSoC mentor feedback hat here: in the future, please elaborate on workflow and design decisions on the PR description, preferably with examples. Also, before going all the way into design (which I think it's pretty cool and good work overall), consider first introducing the simple and direct approach, once that lands, you can incrementally think about how to redesign on top of existing, reliable testcases. Same is true about use cases, in CIR we try not to raise representations without a concrete use case in mind, that could be a guiding approach to consider. As an example: the RFC was very nice, but probably an improvement that could have come up incrementally, once we have fledged how the pieces connect in practice for more than opencl address spaces.

seven-mile · 2024-07-11T03:52:50Z

CIR is then lowered as part of target specific ABI lowering to a CIR representation that has a target specific address space information.
Q: Is this right?

No. We planned to only touch the CIR AS while or just before LowerToLLVM. Wherever we do the lowering, in the TypeConverter or a separate pass, it just intend to do the lowering directly or prepare for the lowering.

Let me describe the original motive of a separate pass. When I first try introducing target-specific info in the type converter of LLVM lowering, I found that it have to mess up the ConvertCIRToLLVMPass with strange synchronization and an unused rewriter state. I think that's not acceptable. So a direct thought on that would be to use a separate pass to extract and isolate these details. Note that I also added the new pass in populateCIRPreLoweringPasses.

But when a separate pass leads to the misfunctional replacing in struct type, I didn't do a rethink on the approach, just thinking "it's a bug so let's fix it anyway". Now I realized that the ground is not solid enough. I think we can just go for a solution in TypeConverter.

Q: I don't see any tests checking what become of those when the ABI lowering is done, do they become addrspace(target<N>) at that point?

Same no and same reason. Sorry for the confusion.

Q: What happens when the __attribute__... is speficied with an arbitrary N number, let's say 777?, is that still represented by addrspace(target<N>)?

Yes.

I find that ABI lowering for address space a bit premature. If I'm writing a pass on top of CIR, now I need to know whether that CIR is pre or post target ABI lowering to try understand what's version of address spaces I should be looking at...
Q: In case other __attribute__s are used within the same code, with arbitrary numbers, how is one supposed to work with the mixed things?
Q: What kind of compiler transformations are you imagining that would require this kind of distiction this early? When users of OpenCL want to write a transformation, do they care whether they are unified or already target specific?

True. It's totally a different thing from CallConvLowering. I think the unified one always carries more information than the target-specific one, otherwise we wouldn't be able to do the lowering. In other words, it shouldn't provide any extra optimization opportunities if we do it earlier than LowerToLLVM. I think we are in agreement on this topic.

ABI info should be designed to allow queries from within LowerToLLVM, not sure if it's possible to do that now, but I brought up this point to @sitio-couto previously.

Sure. TBH, introducing target-specific information in TypeConverter is not a common use case. I don't expect it to be done.😂

Currently @sitio-couto suggested me that a general method here is to keep an instance of LowerModule for the pass requiring TargetLoweringInfo just like what CallConvLowering does. It's an interim solution and waiting for future refactor. The @jopperm's proposed approach that keeps the reference to the AS map rather than the LowerModule looks more graceful to me (at least no strange lazy initialization logic). We will also need to update it to a more fluent way when one is available. Wdyt?

Many thanks for the feedback and suggestions! I often have a feel that I lack the ability to describe my obscure design choices in a brief and clear manner: I'm always replaying what I did and presenting verbose details, or just providing too less information by a too short description. Understanding this fact would be a good start, and trying providing more examples should helps greatly. I'll try my best to do better next time ; )

seven-mile · 2024-07-12T12:49:56Z

Made some changes:

Use the type converter and remove the codes around AttrTypeSubElementHandler.
Rename map_t to MapTy to match the naming convention of MLIR.
To pass the AS map into the type converter, just like DataLayout, I use a pointer with null semantic corresponding to "absence of cir.triple". When we are lowering non-null AddressSpaceAttr, we assert that the AS map can not be null. It avoids implicit miscomilations.
Extended existing test cases: IR/address-space.cir & Lowering/address-space.cir.

Sorry. Ignore the close and re-open, mistake.🥲

Regarding breaking this PR into dependent pieces, I'm planning:

[NFC] Refactor createLowerModule to be used by libLowerToLLVM
[NFC] enable TargetLoweringInfo for AppleArm64
(This PR) The core offload_* supporting codes with SPIR TargetInfo

Will rebase this PR after that, sorry for the review churn.

bcardosolopes · 2024-07-12T18:30:10Z

Will rebase this PR after that, sorry for the review churn.

Don't worry about that, thanks for breaking this up!

bcardosolopes · 2024-07-12T18:51:51Z

Further PR splitting you could do here:

Add the CIR dialect bits for unified address spaces + clang/test/CIR/IR/address-space.cir.
Add SPIR.cpp skeleton without the address space maps.

This will basically leave the contentios part of this PR isolated and easier to iterate.

seven-mile · 2024-07-18T01:38:55Z

Changes and PR description updated. This PR is ready for another round of review. ; )

seven-mile · 2024-07-18T01:52:04Z

clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp

+  if (!module->hasAttr("cir.triple"))
+    return nullptr;


Note that IRASMap const * is a pointer to array. It would be nullptr if triple is not present.

If no address space attributes are present in CIR, we ensure that a null IRASMap remains functional. However, if there is an address space attribute to convert, the guard at line 3619 will reject it.

I thought a bit about using the pointer here, but couldn't come up with a better design. I believe using std::optional<std::reference_wrapper<IRASMap>> would work, but that doesn't really simplify anything.

The underlying question is, do we really need this miscompilation protection here or could we just assume a default mapping similar to always having a DataLayout? In case a target triple is present and the target doesn't supply an AS map (or an incomplete AS map), we'd happily use a default flattening to (target) AS 0. To catch that, we'd need to use an actual map (e.g. SmallDenseMap) to let targets opt-in to the CIR AS they support, but that design point would deviate from OG clang.

To clarify, for this PR I'm suggesting to either (1) keep the "no triple -> null map" logic or (2) drop the check and use a default map when no triple is present. I meant (3) "targets opt-in to supported AS" more as food for thought and potential future improvement that should not hold up this PR which finally reactivates the in-tree OpenCL tests 🎉

jopperm

Neat after all the prep PRs landed 👍

jopperm · 2024-07-18T08:07:26Z

clang/include/clang/CIR/Dialect/IR/CIRAttrs.td

@@ -707,6 +707,7 @@ def AddressSpaceAttr : CIR_Attr<"AddressSpace", "addrspace"> {
  let extraClassDeclaration = [{
    static constexpr char kTargetKeyword[] = "}]#targetASCase.symbol#[{";
    static constexpr int32_t kFirstTargetASValue = }]#targetASCase.value#[{;
+    using MapTy = unsigned[kFirstTargetASValue];


Suggested change

using MapTy = unsigned[kFirstTargetASValue];

using MapTy = std::array<unsigned, kFirstTargetASValue>;

(more idiomatic in C++, and might prevent future headaches if we need to copy the maps etc.)

jopperm · 2024-07-18T08:27:36Z

clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp

@@ -3583,24 +3585,44 @@ void populateCIRToLLVMConversionPatterns(mlir::RewritePatternSet &patterns,
 }

 namespace {
+
+using IRASMap = mlir::cir::AddressSpaceAttr::MapTy;


Should the short name be IRASMapTy?

I think MeowTy is a common pattern for member types in class. For free type names used in variable/parameter declarations, I prefer a clean CamelCase without suffix, because CamelCase is already the feature of types, no?

Btw, IRASMap copies its name from clang::LangASMap XD

Makes sense!

I prefer a clean CamelCase without suffix

We should use the pattern that's used overall in LLVM despite our personal preferences, otherwise someone will need to fix this when we upstream this chunk of code.

I think I mean the same, by "because CamelCase is already the feature of types, no?"

IRASMap -> IRASMapTy

jopperm · 2024-07-18T08:44:01Z

clang/lib/CIR/Dialect/Transforms/TargetLowering/TargetInfo.cpp

@@ -8,5 +8,10 @@ TargetLoweringInfo::TargetLoweringInfo(std::unique_ptr<ABIInfo> Info)

 TargetLoweringInfo::~TargetLoweringInfo() = default;

+AddressSpaceAttr::MapTy const &TargetLoweringInfo::getCIRAddrSpaceMap() const {
+  static AddressSpaceAttr::MapTy defaultAddrSpaceMap = {0};


Add a comment saying that the default target behavior is to flatten all address spaces to AS 0.

The = {0} is not strictly necessary here, correct? I find it a bit misleading, and the zero-initialization is better explained as a comment I think.

bcardosolopes

Another round of review, thanks for breaking it, this is much simpler and easier to review.

Have you considered adding a AddressMapAttr (wrapping an ArrayAttr perhaps`)? Seems like it would make verifying the length easy, easy to deal with the default value because we can encode in tablegen, and easy for targets to populate and return. Given how often those will be built in a compilation instance, value semantics is probably easier to follow here.

bcardosolopes · 2024-07-18T19:32:31Z

clang/lib/CIR/Dialect/Transforms/TargetLowering/TargetInfo.cpp

@@ -8,5 +8,11 @@ TargetLoweringInfo::TargetLoweringInfo(std::unique_ptr<ABIInfo> Info)

 TargetLoweringInfo::~TargetLoweringInfo() = default;

+AddressSpaceAttr::MapTy const &TargetLoweringInfo::getCIRAddrSpaceMap() const {
+  // Flatten all non-target address spaces to addrspace(0)
+  static AddressSpaceAttr::MapTy defaultAddrSpaceMap = {};


Why does this need to be static? We usually reserve using local static variables only when it's strictly needed, and looks like this function could just be a return {}? Note that this or the target versions are not called much more than a couple times in a pipeline, so probably not worth any early optimization.

Sure, we can replace the reference with just value.

bcardosolopes · 2024-07-18T19:35:25Z

clang/lib/CIR/Dialect/Transforms/TargetLowering/Targets/SPIR.cpp

+    1, // offload_global
+    2, // offload_constant
+    4, // offload_generic
+};


What happens if a target define an address space map that has fewer, more or none elements? Why this need to be a static global? Seems like a constant global would be enough.

What happens if a target define an address space map that has fewer, more or none elements?

The target author should use any dummy value they want, and ensure by their own that no unexpected AS would be emitted for normal CodeGen.

In other words, all maps for different targets should encode all cases that is always sync with AddressSpaceAttr.

This is the approach used by clang::LangASMap. I think it's reasonable because

It does not make sense to add a target that overrides getCIRAddrSpaceMap without considering all (language or unified) addrspace cases.

It does not make sense to add a (language or unified) addrspace case without considering all maps in all targets.

In other words, all maps for different targets should encode all cases that is always sync with AddressSpaceAttr.

My question is a more practical one: if by mistake the array is incomplete, do we get a compiler error or we gonna silently miscompile? A lookup tables are cool, but if you only use it once it might not be worth the potential mistakes other could make.

clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp

clang/test/CIR/Lowering/address-space.cir

bcardosolopes · 2024-07-18T19:42:08Z

clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp

@@ -3583,24 +3585,44 @@ void populateCIRToLLVMConversionPatterns(mlir::RewritePatternSet &patterns,
 }

 namespace {
+
+using IRASMap = mlir::cir::AddressSpaceAttr::MapTy;


I prefer a clean CamelCase without suffix

We should use the pattern that's used overall in LLVM despite our personal preferences, otherwise someone will need to fix this when we upstream this chunk of code.

seven-mile

For the attribute approach, I think it's a bit weird for the value returned by TargetLoweringInfo:

The ideal pattern for CIR addrspace map in the future should be as simple as lowerModule.getTargetLoweringInfo().getCIRAddrSpaceMap()[cirAS], align to other target-specific information. We need an extra wrapper getIRAddrSpaceMap for now only because TargetLoweringInfo is premature temporarily.
MLIR objects require a context to live in, which would complicate the construction.
Attribute that never exists in IR is probably abusing the infrastructure.
MLIR attributes are usually harder to work with: the dynamic element type and variable length of ArrayAttr. We do not even need a verification logic if the map has the type std::array<unsigned, kFirstTargetASValue>.

I don't see explicit drawbacks of "no triple -> null map". Note that the approach is relatively self-contained. getIRAddrSpaceMap resides in an anonymous namespace as the implementation details.

So if we don't expect a reference to static storage, we can pass it by value overall. To be more specific, use the following combination:

MapTy = std::array<unsigned, kFirstTargetASValue>: leaves the addrspace map as is
AddressSpaceAttr::MapTy TargetLoweringInfo::getCIRAddrSpaceMap(): copies the std::array and pass it around
std::optional<AddressSpaceAttr::MapTy> getIRAddrSpaceMap(mlir::ModuleOp module): provides the extra semantic for "no triple -> null map"

seven-mile · 2024-07-19T00:49:55Z

clang/lib/CIR/Dialect/Transforms/TargetLowering/Targets/SPIR.cpp

+    1, // offload_global
+    2, // offload_constant
+    4, // offload_generic
+};


What happens if a target define an address space map that has fewer, more or none elements?

The target author should use any dummy value they want, and ensure by their own that no unexpected AS would be emitted for normal CodeGen.

In other words, all maps for different targets should encode all cases that is always sync with AddressSpaceAttr.

This is the approach used by clang::LangASMap. I think it's reasonable because

It does not make sense to add a target that overrides getCIRAddrSpaceMap without considering all (language or unified) addrspace cases.

It does not make sense to add a (language or unified) addrspace case without considering all maps in all targets.

seven-mile · 2024-07-19T00:51:54Z

clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp

@@ -3583,24 +3585,44 @@ void populateCIRToLLVMConversionPatterns(mlir::RewritePatternSet &patterns,
 }

 namespace {
+
+using IRASMap = mlir::cir::AddressSpaceAttr::MapTy;


I think I mean the same, by "because CamelCase is already the feature of types, no?"

clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp

seven-mile · 2024-07-19T01:30:17Z

clang/lib/CIR/Dialect/Transforms/TargetLowering/TargetInfo.cpp

@@ -8,5 +8,11 @@ TargetLoweringInfo::TargetLoweringInfo(std::unique_ptr<ABIInfo> Info)

 TargetLoweringInfo::~TargetLoweringInfo() = default;

+AddressSpaceAttr::MapTy const &TargetLoweringInfo::getCIRAddrSpaceMap() const {
+  // Flatten all non-target address spaces to addrspace(0)
+  static AddressSpaceAttr::MapTy defaultAddrSpaceMap = {};


Sure, we can replace the reference with just value.

bcardosolopes

I think this approach is still a too much for what it needs to do. I'm not confortable with the table approach (for reasons already mentioned and didn't get satifying answers) and also creating the lowerModule here is a lot for something that could be a switch. Ideally, in a separate PR, one could go and pass in the lowerModule if possible and only create it if none is available, that might mitigate the cost a bit (e.g. if calling from cir-opt then you create it, if from clang, it's passed down to the LLVM lowering pass).

Additionally, thinking more about this, I don't really see any special reason for this to be more part of TargetLoweringInfo than ABIInfo. That said, I suggest you move SPIRVABIInfo into a header, add a method that translate a unified address space into your target thing. This method should (a) contain a switch that goes over a unified enum and returns a target specific int (b) be unrecheable in the base class.

bcardosolopes · 2024-07-19T20:05:55Z

clang/lib/CIR/Dialect/Transforms/TargetLowering/Targets/SPIR.cpp

+    1, // offload_global
+    2, // offload_constant
+    4, // offload_generic
+};


In other words, all maps for different targets should encode all cases that is always sync with AddressSpaceAttr.

My question is a more practical one: if by mistake the array is incomplete, do we get a compiler error or we gonna silently miscompile? A lookup tables are cool, but if you only use it once it might not be worth the potential mistakes other could make.

bcardosolopes · 2024-07-19T20:07:12Z

clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp

@@ -3583,24 +3585,44 @@ void populateCIRToLLVMConversionPatterns(mlir::RewritePatternSet &patterns,
 }

 namespace {
+
+using IRASMap = mlir::cir::AddressSpaceAttr::MapTy;


IRASMap -> IRASMapTy

seven-mile · 2024-07-24T04:36:12Z

... add a method that translate a unified address space into your target thing. This method should (a) contain a switch that goes over a unified enum and returns a target specific int (b) be unrecheable in the base class.

A faultless switch is very nice to have. Now I only rely on #752 to provide the optional semantic for LowerModule so that I can handle the test cases without target info.

Ideally, in a separate PR, one could go and pass in the lowerModule if possible and only create it if none is available, that might mitigate the cost a bit (e.g. if calling from cir-opt then you create it, if from clang, it's passed down to the LLVM lowering pass).

I'm not sure I understand the "cost" here. But, sure, I believe there will be only one instance in the whole pipeline for LowerModule, LowerContext, LowerFunction ... in the future. Given that Vinicius is actively working on it, I don't want to interfere his plan.

I don't really see any special reason for this to be more part of TargetLoweringInfo than ABIInfo.

I don't think it make many differences regarding functionality. I prefer keeping the method getTargetAddrSpaceFromCIRAddrSpace attached to TargetLoweringInfo because

The description of clang::CodeGen::ABIInfo

ABIInfo - Target specific hooks for defining how a type should be passed or returned from functions.

In OG CodeGen, there's a method related to address space: getASTAllocaAddressSpace. It's designed to be in TargetCodeGenInfo, rather than in ABIInfo.

seven-mile · 2024-07-25T09:26:31Z

Rebased. The description of PR also updated.

clang/lib/CIR/Dialect/Transforms/TargetLowering/Targets/AArch64.cpp

clang/lib/CIR/Dialect/Transforms/TargetLowering/Targets/X86.cpp

bcardosolopes

Nice, almost there!

Two remaining pieces I'd like to see here before accepting this. I'm open to merge this PR right away if I can get your word on tackling them in two next separate PRs:

Now that LowerModule is a unique ptr, pass it down to LowerToLLVM. If coming from cir-opt, you then instantiate a new one.
Ignoring the triple / requiring both CIR and LLVM makes me very uncomfortable, the receipt for technical debt, we better get this right out of the bat. All CIR tests that needed it, should have the CIR triple. LLVM triple should be created from the CIR one, it's unnecessary to have both in the same file.

Seems like you need this before @sitio-couto does, so probably the right opportunity to implement it.

clang/lib/CIR/Dialect/Transforms/TargetLowering/Targets/AArch64.cpp

seven-mile · 2024-07-26T03:55:42Z

I'm open to merge this PR right away if I can get your word on tackling them in two next separate PRs.

Sure. I would be glad to improve these two aspects right in the next PRs. But the final target triple / data layout may need further considerations:

I think Vinicius want to use a similar approach to FIR because the setMLIRDataLayout function is copied from FIR ; ). So first let's take a rough look at what FIR does for target triple and target data layout.

For both triple and data layout, FIR uses the module attribute name in LLVM. It's consistent so I think it's fine. We have inconsistency in CIR though.

For triple, there are test cases invoking

fir-opt --target-rewrite="target=x86_64-unknown-linux-gnu"
- Yes, TargetRewrite is a pass to constitute the pass pipeline here of fir-opt. This pass and FIRToLLVMLowering pass are both parameterized by target and datalayout stuff. The default value of target is empty string, which means default triple of LLVM: the triple of toolchain.
and tco --target=aarch64-unknown-linux-gnu.
- tco is used for LLVM Lowering therefore similar to cir-translate, but they usually tag it as "for internal test only". This option set (or rewrite) the target attribute in module manually. The default value of target is native.

With the two option serve for FIR-to-FIR and FIR-to-LLVM pipeline. FIR doesn't need any triple set in the IR of test cases. It set the triple in command line like compiler drivers.

For data layout, FIR is taking empty string as the default value. I don't think it's a good design.

Generally speaking, I'm considering the following approach:

Let cir-translate be able to somehow consume the triple from command line. If none, set it to native. mlir-translate also uses llvm::cl, Hopefully it would work flawlessly with a llvm::cl::opt extended.
Get clang::TargetInfo from the triple, then fetch the data layout string defined by the target.

bcardosolopes · 2024-07-26T22:19:16Z

@seven-mile thanks for the detailed comparison with FIR, your plan sounds solid, thanks for looking into this!

bcardosolopes

LGTM

…cases for OpenCL with SPIR-V target (llvm#724) This PR implements `offload_*` cases discussed in [this thread](https://discourse.llvm.org/t/rfc-clangir-unified-address-space-design-in-clangir/79728). * Integrate target-specific CIR-to-LLVM address space map into `TargetLoweringInfo` * CIRGen: Implement these cases in `getValueFromLangAS` * Lowering: Extend the state of type converter with `LowerModule` When frontend provides a new LangAS like `opencl_generic`, it would be processed by CIRGenTypes and `Builder.getPointerTo()` and encoded as `offload_generic`. When we lower CIR to LLVM, * For pointer types without address space attribute, it's mapped to `ptr addrspace(0)` directly * For target cases `target<x>`, it's mapped to `ptr addrspace(x)` * For other defined cases, query the target info with a new virtual method `getTargetAddrSpaceFromCIRAddrSpace`. General targets like X86 and ARM64 map all known cases to 0. For SPIR-V target here, it maps `offload_generic` to `addrspace(4)`.

…cases for OpenCL with SPIR-V target (#724) This PR implements `offload_*` cases discussed in [this thread](https://discourse.llvm.org/t/rfc-clangir-unified-address-space-design-in-clangir/79728). * Integrate target-specific CIR-to-LLVM address space map into `TargetLoweringInfo` * CIRGen: Implement these cases in `getValueFromLangAS` * Lowering: Extend the state of type converter with `LowerModule` When frontend provides a new LangAS like `opencl_generic`, it would be processed by CIRGenTypes and `Builder.getPointerTo()` and encoded as `offload_generic`. When we lower CIR to LLVM, * For pointer types without address space attribute, it's mapped to `ptr addrspace(0)` directly * For target cases `target<x>`, it's mapped to `ptr addrspace(x)` * For other defined cases, query the target info with a new virtual method `getTargetAddrSpaceFromCIRAddrSpace`. General targets like X86 and ARM64 map all known cases to 0. For SPIR-V target here, it maps `offload_generic` to `addrspace(4)`.

seven-mile requested review from lanza and bcardosolopes as code owners July 7, 2024 18:40

seven-mile requested a review from sitio-couto July 7, 2024 18:40

seven-mile commented Jul 7, 2024

View reviewed changes

seven-mile mentioned this pull request Jul 8, 2024

[GSoC] Add OpenCL support to compile GPU kernels #689

Closed

bcardosolopes requested changes Jul 9, 2024

View reviewed changes

seven-mile force-pushed the offload-as branch from 3800ec2 to 6e07c09 Compare July 11, 2024 15:19

seven-mile closed this Jul 12, 2024

seven-mile reopened this Jul 12, 2024

seven-mile marked this pull request as draft July 12, 2024 13:12

seven-mile mentioned this pull request Jul 13, 2024

[CIR][Dialect] Add minimal definitions of unified address space offload_* cases #738

Merged

seven-mile changed the title ~~[CIR][Dialect] Add offload_* cases in address space attribute~~ [CIR][CodeGen][LowerToLLVM] End-to-end implementation of offload_* cases Jul 13, 2024

seven-mile changed the title ~~[CIR][CodeGen][LowerToLLVM] End-to-end implementation of offload_* cases~~ [CIR][CodeGen][LowerToLLVM] End-to-end implementation of offload_* cases for OpenCL with SPIR-V target Jul 17, 2024

seven-mile mentioned this pull request Jul 18, 2024

[CIR][CodeGen][Lowering] Better handling of alloca address space with unified AS #682

Merged

seven-mile force-pushed the offload-as branch from 9de8899 to 6f1be95 Compare July 18, 2024 01:36

seven-mile marked this pull request as ready for review July 18, 2024 01:37

seven-mile commented Jul 18, 2024

View reviewed changes

jopperm reviewed Jul 18, 2024

View reviewed changes

bcardosolopes requested changes Jul 18, 2024

View reviewed changes

seven-mile commented Jul 19, 2024

View reviewed changes

bcardosolopes reviewed Jul 19, 2024

View reviewed changes

[CIR][Dialect] Add offload_* cases in address space attribute

b776627

seven-mile force-pushed the offload-as branch from 6871490 to b776627 Compare July 25, 2024 09:02

jopperm reviewed Jul 25, 2024

View reviewed changes

clang/lib/CIR/Dialect/Transforms/TargetLowering/Targets/AArch64.cpp Show resolved Hide resolved

clang/lib/CIR/Dialect/Transforms/TargetLowering/Targets/X86.cpp Show resolved Hide resolved

bcardosolopes reviewed Jul 25, 2024

View reviewed changes

clang/lib/CIR/Dialect/Transforms/TargetLowering/Targets/AArch64.cpp Show resolved Hide resolved

bcardosolopes approved these changes Jul 26, 2024

View reviewed changes

bcardosolopes merged commit 9445acc into llvm:main Jul 26, 2024
6 checks passed

seven-mile mentioned this pull request Jul 27, 2024

Teach CIR pointers about address space #418

Closed

	/// Allow walking and replacing the subelements of a LLVMStructTypeStorage key.
	template <>
	struct AttrTypeSubElementHandler<LLVM::detail::LLVMStructTypeStorage::Key> {
	static void walk(const LLVM::detail::LLVMStructTypeStorage::Key &param,
	AttrTypeImmediateSubElementWalker &walker) {
	if (param.isIdentified())
	walker.walkRange(param.getIdentifiedStructBody());
	else
	walker.walkRange(param.getTypeList());
	}
	static FailureOr<LLVM::detail::LLVMStructTypeStorage::Key>
	replace(const LLVM::detail::LLVMStructTypeStorage::Key &param,
	AttrSubElementReplacements &attrRepls,
	TypeSubElementReplacements &typeRepls) {
	// TODO: It's not clear how we support replacing sub-elements of mutable
	// types.
	if (param.isIdentified())
	return failure();

	return LLVM::detail::LLVMStructTypeStorage::Key(
	typeRepls.take_front(param.getTypeList().size()), param.isPacked());
	}
	};

		@@ -0,0 +1,72 @@
		//===- AArch64.cpp --------------------------------------------------------===//

	using MapTy = unsigned[kFirstTargetASValue];
	using MapTy = std::array<unsigned, kFirstTargetASValue>;

[CIR][CodeGen][LowerToLLVM] End-to-end implementation of offload_* cases for OpenCL with SPIR-V target #724

[CIR][CodeGen][LowerToLLVM] End-to-end implementation of offload_* cases for OpenCL with SPIR-V target #724

Conversation

seven-mile commented Jul 7, 2024 • edited Loading

seven-mile left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bcardosolopes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seven-mile Jul 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bcardosolopes commented Jul 10, 2024

seven-mile commented Jul 11, 2024

seven-mile commented Jul 12, 2024 • edited Loading

bcardosolopes commented Jul 12, 2024

bcardosolopes commented Jul 12, 2024 • edited Loading

seven-mile commented Jul 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jopperm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bcardosolopes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seven-mile Jul 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seven-mile left a comment • edited Loading

Choose a reason for hiding this comment

seven-mile Jul 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bcardosolopes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seven-mile commented Jul 24, 2024

seven-mile commented Jul 25, 2024

bcardosolopes left a comment

Choose a reason for hiding this comment

seven-mile commented Jul 26, 2024 • edited Loading

bcardosolopes commented Jul 26, 2024

bcardosolopes left a comment

Choose a reason for hiding this comment

[CIR][CodeGen][LowerToLLVM] End-to-end implementation of `offload_*` cases for OpenCL with SPIR-V target #724

[CIR][CodeGen][LowerToLLVM] End-to-end implementation of `offload_*` cases for OpenCL with SPIR-V target #724

seven-mile commented Jul 7, 2024 •

edited

Loading

seven-mile Jul 9, 2024 •

edited

Loading

seven-mile commented Jul 12, 2024 •

edited

Loading

bcardosolopes commented Jul 12, 2024 •

edited

Loading

seven-mile commented Jul 18, 2024 •

edited

Loading

seven-mile Jul 19, 2024 •

edited

Loading

seven-mile left a comment •

edited

Loading

seven-mile Jul 19, 2024 •

edited

Loading

seven-mile commented Jul 26, 2024 •

edited

Loading