Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[X86][AVX10] Permit AVX512 options/features used together with AVX10 #71318

Merged
merged 3 commits into from
Nov 10, 2023

Conversation

phoebewang
Copy link
Contributor

@phoebewang phoebewang commented Nov 5, 2023

This patch relaxes the driver logic to permit combinations between AVX512 and AVX10 options and makes sure we have a unified behavior between options and features combination.

Here are rules we are following when handle these combinations:

  1. evex512 can only be used for avx512xxx options/features. It will be ignored if used without them;
  2. avx512xxx and avx10.xxx are options in two worlds. Avoid to use them together in any case. It will enable a common super set when they are used together. E.g., "-mavx512f -mavx10.1-256" euqals "-mavx10.1-512".

Compiler emits warnings when user using combinations like "-mavx512f -mavx10.1-256" in case they won't get unexpected result silently.

Function target feature attribute follows the same rule now. We have to add "no-evex512" feature for intrinsics shared between AVX512 and AVX10. We also add "no-evex512" for early ISAs like AVX etc., because some of them are called by AVX512 intrinsics.

This patch relaxes the driver logic to permit combinations between
AVX512 and AVX10 options and makes sure we have a unified behavior
between options and features combination.

Here are rules we are following when handle these combinations:
1. evex512 can only be used for avx512xxx options/features. It will be
   ignored if used without them;
2. avx512xxx and avx10.xxx are options in two worlds. Avoid to use them
   together in any case. It will enable a common super set when they are
   used together. E.g., "-mavx512f -mavx10.1-256" euqals "-mavx10.1-512".

Compiler emits warnings when user using combinations like
"-mavx512f -mavx10.1-256" in case they won't get unexpected result silently.
@llvmbot llvmbot added clang Clang issues not falling into any other category backend:X86 clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:headers Headers provided by Clang, e.g. for intrinsics labels Nov 5, 2023
@llvmbot
Copy link
Member

llvmbot commented Nov 5, 2023

@llvm/pr-subscribers-clang-driver
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-backend-x86

Author: Phoebe Wang (phoebewang)

Changes

This patch relaxes the driver logic to permit combinations between AVX512 and AVX10 options and makes sure we have a unified behavior between options and features combination.

Here are rules we are following when handle these combinations:

  1. evex512 can only be used for avx512xxx options/features. It will be ignored if used without them;
  2. avx512xxx and avx10.xxx are options in two worlds. Avoid to use them together in any case. It will enable a common super set when they are used together. E.g., "-mavx512f -mavx10.1-256" euqals "-mavx10.1-512".

Compiler emits warnings when user using combinations like "-mavx512f -mavx10.1-256" in case they won't get unexpected result silently.


Patch is 42.99 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/71318.diff

33 Files Affected:

  • (modified) clang/include/clang/Basic/DiagnosticCommonKinds.td (+2)
  • (modified) clang/lib/Basic/Targets/X86.cpp (+38-19)
  • (modified) clang/lib/Driver/ToolChains/Arch/X86.cpp (-7)
  • (modified) clang/lib/Headers/avx2intrin.h (+2-2)
  • (modified) clang/lib/Headers/avx512bf16intrin.h (+2-1)
  • (modified) clang/lib/Headers/avx512bwintrin.h (+3-1)
  • (modified) clang/lib/Headers/avx512dqintrin.h (+3-1)
  • (modified) clang/lib/Headers/avx512fintrin.h (+6-2)
  • (modified) clang/lib/Headers/avx512fp16intrin.h (+4-2)
  • (modified) clang/lib/Headers/avx512ifmavlintrin.h (+8-2)
  • (modified) clang/lib/Headers/avx512pfintrin.h (-5)
  • (modified) clang/lib/Headers/avx512vbmivlintrin.h (+8-3)
  • (modified) clang/lib/Headers/avx512vlbf16intrin.h (+8-6)
  • (modified) clang/lib/Headers/avx512vlbitalgintrin.h (+8-2)
  • (modified) clang/lib/Headers/avx512vlbwintrin.h (+8-2)
  • (modified) clang/lib/Headers/avx512vlcdintrin.h (+8-3)
  • (modified) clang/lib/Headers/avx512vldqintrin.h (+8-2)
  • (modified) clang/lib/Headers/avx512vlfp16intrin.h (+2-2)
  • (modified) clang/lib/Headers/avx512vlintrin.h (+8-2)
  • (modified) clang/lib/Headers/avx512vlvbmi2intrin.h (+8-2)
  • (modified) clang/lib/Headers/avx512vlvnniintrin.h (+8-2)
  • (modified) clang/lib/Headers/avx512vlvp2intersectintrin.h (+6-4)
  • (modified) clang/lib/Headers/avx512vpopcntdqvlintrin.h (+6-2)
  • (modified) clang/lib/Headers/avxintrin.h (+2-2)
  • (modified) clang/lib/Headers/emmintrin.h (+2-2)
  • (modified) clang/lib/Headers/gfniintrin.h (+10-4)
  • (modified) clang/lib/Headers/pmmintrin.h (+1-1)
  • (modified) clang/lib/Headers/smmintrin.h (+1-1)
  • (modified) clang/lib/Headers/tmmintrin.h (+2-2)
  • (modified) clang/lib/Headers/xmmintrin.h (+2-2)
  • (modified) clang/test/CodeGen/X86/avx512-error.c (+13)
  • (modified) clang/test/CodeGen/target-avx-abi-diag.c (+25-3)
  • (modified) clang/test/Driver/x86-target-features.c (+2-4)
diff --git a/clang/include/clang/Basic/DiagnosticCommonKinds.td b/clang/include/clang/Basic/DiagnosticCommonKinds.td
index 9f0ccd255a32148..8084a4ce0d1751b 100644
--- a/clang/include/clang/Basic/DiagnosticCommonKinds.td
+++ b/clang/include/clang/Basic/DiagnosticCommonKinds.td
@@ -346,6 +346,8 @@ def err_opt_not_valid_on_target : Error<
   "option '%0' cannot be specified on this target">;
 def err_invalid_feature_combination : Error<
   "invalid feature combination: %0">;
+def warn_invalid_feature_combination : Warning<
+  "invalid feature combination: %0">, InGroup<DiagGroup<"invalid-feature-combination">>;
 def warn_target_unrecognized_env : Warning<
   "mismatch between architecture and environment in target triple '%0'; did you mean '%1'?">,
   InGroup<InvalidCommandLineArgument>;
diff --git a/clang/lib/Basic/Targets/X86.cpp b/clang/lib/Basic/Targets/X86.cpp
index eec3cd558435e2a..9cfda95f385d627 100644
--- a/clang/lib/Basic/Targets/X86.cpp
+++ b/clang/lib/Basic/Targets/X86.cpp
@@ -119,9 +119,13 @@ bool X86TargetInfo::initFeatureMap(
     setFeatureEnabled(Features, F, true);
 
   std::vector<std::string> UpdatedFeaturesVec;
-  bool HasEVEX512 = true;
+  std::vector<std::string> UpdatedAVX10FeaturesVec;
+  int HasEVEX512 = -1;
   bool HasAVX512F = false;
   bool HasAVX10 = false;
+  bool HasAVX10_512 = false;
+  std::string LastAVX10;
+  std::string LastAVX512;
   for (const auto &Feature : FeaturesVec) {
     // Expand general-regs-only to -x86, -mmx and -sse
     if (Feature == "+general-regs-only") {
@@ -131,35 +135,50 @@ bool X86TargetInfo::initFeatureMap(
       continue;
     }
 
-    if (Feature.substr(0, 7) == "+avx10.") {
-      HasAVX10 = true;
-      HasAVX512F = true;
-      if (Feature.substr(Feature.size() - 3, 3) == "512") {
-        HasEVEX512 = true;
-      } else if (Feature.substr(7, 2) == "1-") {
-        HasEVEX512 = false;
+    if (Feature.substr(1, 6) == "avx10.") {
+      if (Feature[0] == '+') {
+        HasAVX10 = true;
+        if (Feature.substr(Feature.size() - 3, 3) == "512")
+          HasAVX10_512 = true;
+        LastAVX10 = Feature;
+      } else if (HasAVX10 && Feature == "-avx10.1-256") {
+        HasAVX10 = false;
+        HasAVX10_512 = false;
+      } else if (HasAVX10_512 && Feature == "-avx10.1-512") {
+        HasAVX10_512 = false;
       }
+      // Postpone AVX10 features handling after AVX512 settled.
+      UpdatedAVX10FeaturesVec.push_back(Feature);
+      continue;
     } else if (!HasAVX512F && Feature.substr(0, 7) == "+avx512") {
       HasAVX512F = true;
+      LastAVX512 = Feature;
     } else if (HasAVX512F && Feature == "-avx512f") {
       HasAVX512F = false;
-    } else if (HasAVX10 && Feature == "-avx10.1-256") {
-      HasAVX10 = false;
-      HasAVX512F = false;
-    } else if (!HasEVEX512 && Feature == "+evex512") {
+    } else if (HasEVEX512 != true && Feature == "+evex512") {
       HasEVEX512 = true;
-    } else if (HasEVEX512 && Feature == "-avx10.1-512") {
-      HasEVEX512 = false;
-    } else if (HasEVEX512 && Feature == "-evex512") {
+      continue;
+    } else if (HasEVEX512 != false && Feature == "-evex512") {
       HasEVEX512 = false;
+      continue;
     }
 
     UpdatedFeaturesVec.push_back(Feature);
   }
-  if (HasAVX512F && HasEVEX512)
-    UpdatedFeaturesVec.push_back("+evex512");
-  else if (HasAVX10)
-    UpdatedFeaturesVec.push_back("-evex512");
+  llvm::append_range(UpdatedFeaturesVec, UpdatedAVX10FeaturesVec);
+  // HasEVEX512 is a three-states flag. We need to turn it into [+-]evex512
+  // according to other features.
+  if (HasAVX512F) {
+    UpdatedFeaturesVec.push_back(HasEVEX512 == false ? "-evex512" : "+evex512");
+    if (HasAVX10 && !HasAVX10_512 && HasEVEX512 != false)
+      Diags.Report(diag::warn_invalid_feature_combination)
+          << LastAVX512 + " " + LastAVX10 + "; will be promoted to avx10.1-512";
+  } else if (HasAVX10) {
+    if (HasEVEX512 != -1)
+      Diags.Report(diag::warn_invalid_feature_combination)
+          << LastAVX10 + (HasEVEX512 ? " +evex512" : " -evex512");
+    UpdatedFeaturesVec.push_back(HasAVX10_512 ? "+evex512" : "-evex512");
+  }
 
   if (!TargetInfo::initFeatureMap(Features, Diags, CPU, UpdatedFeaturesVec))
     return false;
diff --git a/clang/lib/Driver/ToolChains/Arch/X86.cpp b/clang/lib/Driver/ToolChains/Arch/X86.cpp
index 848c26ddb43e4ae..fbe665bdd5c8afb 100644
--- a/clang/lib/Driver/ToolChains/Arch/X86.cpp
+++ b/clang/lib/Driver/ToolChains/Arch/X86.cpp
@@ -229,7 +229,6 @@ void x86::getX86TargetFeatures(const Driver &D, const llvm::Triple &Triple,
         << D.getOpts().getOptionName(LVIOpt);
   }
 
-  bool HasAVX10 = false;
   for (const Arg *A : Args.filtered(options::OPT_m_x86_AVX10_Features_Group)) {
     StringRef Name = A->getOption().getName();
     A->claim();
@@ -251,7 +250,6 @@ void x86::getX86TargetFeatures(const Driver &D, const llvm::Triple &Triple,
 #endif
 
     Features.push_back(Args.MakeArgString((IsNegative ? "-" : "+") + Name));
-    HasAVX10 = true;
   }
 
   // Now add any that the user explicitly requested on the command line,
@@ -271,14 +269,9 @@ void x86::getX86TargetFeatures(const Driver &D, const llvm::Triple &Triple,
       continue;
     }
 
-    StringRef AVX512Name = Name;
     bool IsNegative = Name.startswith("no-");
     if (IsNegative)
       Name = Name.substr(3);
-    if (HasAVX10 && (Name.startswith("avx512") || Name == "evex512")) {
-      D.Diag(diag::warn_drv_unused_argument) << AVX512Name;
-      continue;
-    }
     Features.push_back(Args.MakeArgString((IsNegative ? "-" : "+") + Name));
   }
 
diff --git a/clang/lib/Headers/avx2intrin.h b/clang/lib/Headers/avx2intrin.h
index 9196c8c7d24f7c8..2bb0fa39c465967 100644
--- a/clang/lib/Headers/avx2intrin.h
+++ b/clang/lib/Headers/avx2intrin.h
@@ -15,8 +15,8 @@
 #define __AVX2INTRIN_H
 
 /* Define the default attributes for the functions in this file. */
-#define __DEFAULT_FN_ATTRS256 __attribute__((__always_inline__, __nodebug__, __target__("avx2"), __min_vector_width__(256)))
-#define __DEFAULT_FN_ATTRS128 __attribute__((__always_inline__, __nodebug__, __target__("avx2"), __min_vector_width__(128)))
+#define __DEFAULT_FN_ATTRS256 __attribute__((__always_inline__, __nodebug__, __target__("avx2,no-evex512"), __min_vector_width__(256)))
+#define __DEFAULT_FN_ATTRS128 __attribute__((__always_inline__, __nodebug__, __target__("avx2,no-evex512"), __min_vector_width__(128)))
 
 /* SSE4 Multiple Packed Sums of Absolute Difference.  */
 /// Computes sixteen sum of absolute difference (SAD) operations on sets of
diff --git a/clang/lib/Headers/avx512bf16intrin.h b/clang/lib/Headers/avx512bf16intrin.h
index ce1dd2ee5bdfe0e..b28d2e243f2cb80 100644
--- a/clang/lib/Headers/avx512bf16intrin.h
+++ b/clang/lib/Headers/avx512bf16intrin.h
@@ -23,7 +23,8 @@ typedef __bf16 __bfloat16 __attribute__((deprecated("use __bf16 instead")));
   __attribute__((__always_inline__, __nodebug__, __target__("avx512bf16,evex512"), \
                  __min_vector_width__(512)))
 #define __DEFAULT_FN_ATTRS                                                     \
-  __attribute__((__always_inline__, __nodebug__, __target__("avx512bf16")))
+  __attribute__((__always_inline__, __nodebug__,                               \
+                 __target__("avx512bf16,no-evex512")))
 
 /// Convert One BF16 Data to One Single Float Data.
 ///
diff --git a/clang/lib/Headers/avx512bwintrin.h b/clang/lib/Headers/avx512bwintrin.h
index df3c7294fba7a08..51dba5427b0fc0a 100644
--- a/clang/lib/Headers/avx512bwintrin.h
+++ b/clang/lib/Headers/avx512bwintrin.h
@@ -20,7 +20,9 @@ typedef unsigned long long __mmask64;
 /* Define the default attributes for the functions in this file. */
 #define __DEFAULT_FN_ATTRS512 __attribute__((__always_inline__, __nodebug__, __target__("avx512bw,evex512"), __min_vector_width__(512)))
 #define __DEFAULT_FN_ATTRS64 __attribute__((__always_inline__, __nodebug__, __target__("avx512bw,evex512")))
-#define __DEFAULT_FN_ATTRS __attribute__((__always_inline__, __nodebug__, __target__("avx512bw")))
+#define __DEFAULT_FN_ATTRS                                                     \
+  __attribute__((__always_inline__, __nodebug__,                               \
+                 __target__("avx512bw,no-evex512")))
 
 static __inline __mmask32 __DEFAULT_FN_ATTRS
 _knot_mask32(__mmask32 __M)
diff --git a/clang/lib/Headers/avx512dqintrin.h b/clang/lib/Headers/avx512dqintrin.h
index 225d3eaf57faea4..88b48e3a32070b6 100644
--- a/clang/lib/Headers/avx512dqintrin.h
+++ b/clang/lib/Headers/avx512dqintrin.h
@@ -16,7 +16,9 @@
 
 /* Define the default attributes for the functions in this file. */
 #define __DEFAULT_FN_ATTRS512 __attribute__((__always_inline__, __nodebug__, __target__("avx512dq,evex512"), __min_vector_width__(512)))
-#define __DEFAULT_FN_ATTRS __attribute__((__always_inline__, __nodebug__, __target__("avx512dq")))
+#define __DEFAULT_FN_ATTRS                                                     \
+  __attribute__((__always_inline__, __nodebug__,                               \
+                 __target__("avx512dq,no-evex512")))
 
 static __inline __mmask8 __DEFAULT_FN_ATTRS
 _knot_mask8(__mmask8 __M)
diff --git a/clang/lib/Headers/avx512fintrin.h b/clang/lib/Headers/avx512fintrin.h
index 5823728f22252b2..4f172c74b31cbb2 100644
--- a/clang/lib/Headers/avx512fintrin.h
+++ b/clang/lib/Headers/avx512fintrin.h
@@ -168,8 +168,12 @@ typedef enum
 
 /* Define the default attributes for the functions in this file. */
 #define __DEFAULT_FN_ATTRS512 __attribute__((__always_inline__, __nodebug__, __target__("avx512f,evex512"), __min_vector_width__(512)))
-#define __DEFAULT_FN_ATTRS128 __attribute__((__always_inline__, __nodebug__, __target__("avx512f"), __min_vector_width__(128)))
-#define __DEFAULT_FN_ATTRS __attribute__((__always_inline__, __nodebug__, __target__("avx512f")))
+#define __DEFAULT_FN_ATTRS128                                                  \
+  __attribute__((__always_inline__, __nodebug__,                               \
+                 __target__("avx512f,no-evex512"), __min_vector_width__(128)))
+#define __DEFAULT_FN_ATTRS                                                     \
+  __attribute__((__always_inline__, __nodebug__,                               \
+                 __target__("avx512f,no-evex512")))
 
 /* Create vectors with repeated elements */
 
diff --git a/clang/lib/Headers/avx512fp16intrin.h b/clang/lib/Headers/avx512fp16intrin.h
index a9428c6feba2e91..4123f10c3951312 100644
--- a/clang/lib/Headers/avx512fp16intrin.h
+++ b/clang/lib/Headers/avx512fp16intrin.h
@@ -25,10 +25,12 @@ typedef _Float16 __m512h_u __attribute__((__vector_size__(64), __aligned__(1)));
   __attribute__((__always_inline__, __nodebug__,                               \
                  __target__("avx512fp16,evex512"), __min_vector_width__(512)))
 #define __DEFAULT_FN_ATTRS256                                                  \
-  __attribute__((__always_inline__, __nodebug__, __target__("avx512fp16"),     \
+  __attribute__((__always_inline__, __nodebug__,                               \
+                 __target__("avx512fp16,no-evex512"),                          \
                  __min_vector_width__(256)))
 #define __DEFAULT_FN_ATTRS128                                                  \
-  __attribute__((__always_inline__, __nodebug__, __target__("avx512fp16"),     \
+  __attribute__((__always_inline__, __nodebug__,                               \
+                 __target__("avx512fp16,no-evex512"),                          \
                  __min_vector_width__(128)))
 
 static __inline__ _Float16 __DEFAULT_FN_ATTRS512 _mm512_cvtsh_h(__m512h __a) {
diff --git a/clang/lib/Headers/avx512ifmavlintrin.h b/clang/lib/Headers/avx512ifmavlintrin.h
index 3284ee182004b86..8787cd471d42396 100644
--- a/clang/lib/Headers/avx512ifmavlintrin.h
+++ b/clang/lib/Headers/avx512ifmavlintrin.h
@@ -15,8 +15,14 @@
 #define __IFMAVLINTRIN_H
 
 /* Define the default attributes for the functions in this file. */
-#define __DEFAULT_FN_ATTRS128 __attribute__((__always_inline__, __nodebug__, __target__("avx512ifma,avx512vl"), __min_vector_width__(128)))
-#define __DEFAULT_FN_ATTRS256 __attribute__((__always_inline__, __nodebug__, __target__("avx512ifma,avx512vl"), __min_vector_width__(256)))
+#define __DEFAULT_FN_ATTRS128                                                  \
+  __attribute__((__always_inline__, __nodebug__,                               \
+                 __target__("avx512ifma,avx512vl,no-evex512"),                 \
+                 __min_vector_width__(128)))
+#define __DEFAULT_FN_ATTRS256                                                  \
+  __attribute__((__always_inline__, __nodebug__,                               \
+                 __target__("avx512ifma,avx512vl,no-evex512"),                 \
+                 __min_vector_width__(256)))
 
 #define _mm_madd52hi_epu64(X, Y, Z)                                            \
   ((__m128i)__builtin_ia32_vpmadd52huq128((__v2di)(X), (__v2di)(Y),            \
diff --git a/clang/lib/Headers/avx512pfintrin.h b/clang/lib/Headers/avx512pfintrin.h
index b8bcf49c6b249c3..f853be021a2dd37 100644
--- a/clang/lib/Headers/avx512pfintrin.h
+++ b/clang/lib/Headers/avx512pfintrin.h
@@ -14,9 +14,6 @@
 #ifndef __AVX512PFINTRIN_H
 #define __AVX512PFINTRIN_H
 
-/* Define the default attributes for the functions in this file. */
-#define __DEFAULT_FN_ATTRS __attribute__((__always_inline__, __nodebug__, __target__("avx512pf")))
-
 #define _mm512_mask_prefetch_i32gather_pd(index, mask, addr, scale, hint) \
   __builtin_ia32_gatherpfdpd((__mmask8)(mask), (__v8si)(__m256i)(index), \
                              (void const *)(addr), (int)(scale), \
@@ -92,6 +89,4 @@
   __builtin_ia32_scatterpfqps((__mmask8)(mask), (__v8di)(__m512i)(index), \
                               (void *)(addr), (int)(scale), (int)(hint))
 
-#undef __DEFAULT_FN_ATTRS
-
 #endif
diff --git a/clang/lib/Headers/avx512vbmivlintrin.h b/clang/lib/Headers/avx512vbmivlintrin.h
index c5b96ae8ada70a9..848ca2d18c3cea5 100644
--- a/clang/lib/Headers/avx512vbmivlintrin.h
+++ b/clang/lib/Headers/avx512vbmivlintrin.h
@@ -15,9 +15,14 @@
 #define __VBMIVLINTRIN_H
 
 /* Define the default attributes for the functions in this file. */
-#define __DEFAULT_FN_ATTRS128 __attribute__((__always_inline__, __nodebug__, __target__("avx512vbmi,avx512vl"), __min_vector_width__(128)))
-#define __DEFAULT_FN_ATTRS256 __attribute__((__always_inline__, __nodebug__, __target__("avx512vbmi,avx512vl"), __min_vector_width__(256)))
-
+#define __DEFAULT_FN_ATTRS128                                                  \
+  __attribute__((__always_inline__, __nodebug__,                               \
+                 __target__("avx512vbmi,avx512vl,no-evex512"),                 \
+                 __min_vector_width__(128)))
+#define __DEFAULT_FN_ATTRS256                                                  \
+  __attribute__((__always_inline__, __nodebug__,                               \
+                 __target__("avx512vbmi,avx512vl,no-evex512"),                 \
+                 __min_vector_width__(256)))
 
 static __inline__ __m128i __DEFAULT_FN_ATTRS128
 _mm_permutex2var_epi8(__m128i __A, __m128i __I, __m128i __B)
diff --git a/clang/lib/Headers/avx512vlbf16intrin.h b/clang/lib/Headers/avx512vlbf16intrin.h
index f5b8911fac2aeb8..89c9f49c7aed0fd 100644
--- a/clang/lib/Headers/avx512vlbf16intrin.h
+++ b/clang/lib/Headers/avx512vlbf16intrin.h
@@ -15,12 +15,14 @@
 #ifndef __AVX512VLBF16INTRIN_H
 #define __AVX512VLBF16INTRIN_H
 
-#define __DEFAULT_FN_ATTRS128 \
-  __attribute__((__always_inline__, __nodebug__, \
-                 __target__("avx512vl, avx512bf16"), __min_vector_width__(128)))
-#define __DEFAULT_FN_ATTRS256 \
-  __attribute__((__always_inline__, __nodebug__, \
-                 __target__("avx512vl, avx512bf16"), __min_vector_width__(256)))
+#define __DEFAULT_FN_ATTRS128                                                  \
+  __attribute__((__always_inline__, __nodebug__,                               \
+                 __target__("avx512vl,avx512bf16,no-evex512"),                 \
+                 __min_vector_width__(128)))
+#define __DEFAULT_FN_ATTRS256                                                  \
+  __attribute__((__always_inline__, __nodebug__,                               \
+                 __target__("avx512vl,avx512bf16,no-evex512"),                 \
+                 __min_vector_width__(256)))
 
 /// Convert Two Packed Single Data to One Packed BF16 Data.
 ///
diff --git a/clang/lib/Headers/avx512vlbitalgintrin.h b/clang/lib/Headers/avx512vlbitalgintrin.h
index 5154eae14cbb3c9..377e3a5ea571327 100644
--- a/clang/lib/Headers/avx512vlbitalgintrin.h
+++ b/clang/lib/Headers/avx512vlbitalgintrin.h
@@ -15,8 +15,14 @@
 #define __AVX512VLBITALGINTRIN_H
 
 /* Define the default attributes for the functions in this file. */
-#define __DEFAULT_FN_ATTRS128 __attribute__((__always_inline__, __nodebug__, __target__("avx512vl,avx512bitalg"), __min_vector_width__(128)))
-#define __DEFAULT_FN_ATTRS256 __attribute__((__always_inline__, __nodebug__, __target__("avx512vl,avx512bitalg"), __min_vector_width__(256)))
+#define __DEFAULT_FN_ATTRS128                                                  \
+  __attribute__((__always_inline__, __nodebug__,                               \
+                 __target__("avx512vl,avx512bitalg,no-evex512"),               \
+                 __min_vector_width__(128)))
+#define __DEFAULT_FN_ATTRS256                                                  \
+  __attribute__((__always_inline__, __nodebug__,                               \
+                 __target__("avx512vl,avx512bitalg,no-evex512"),               \
+                 __min_vector_width__(256)))
 
 static __inline__ __m256i __DEFAULT_FN_ATTRS256
 _mm256_popcnt_epi16(__m256i __A)
diff --git a/clang/lib/Headers/avx512vlbwintrin.h b/clang/lib/Headers/avx512vlbwintrin.h
index 148af5ab9a34d87..9aedba0669991a2 100644
--- a/clang/lib/Headers/avx512vlbwintrin.h
+++ b/clang/lib/Headers/avx512vlbwintrin.h
@@ -15,8 +15,14 @@
 #define __AVX512VLBWINTRIN_H
 
 /* Define the default attributes for the functions in this file. */
-#define __DEFAULT_FN_ATTRS128 __attribute__((__always_inline__, __nodebug__, __target__("avx512vl,avx512bw"), __min_vector_width__(128)))
-#define __DEFAULT_FN_ATTRS256 __attribute__((__always_inline__, __nodebug__, __target__("avx512vl,avx512bw"), __min_vector_width__(256)))
+#define __DEFAULT_FN_ATTRS128                                                  \
+  __attribute__((__always_inline__, __nodebug__,                               \
+                 __target__("avx512vl,avx512bw,no-evex512"),                   \
+                 __min_vector_width__(128)))
+#define __DEFAULT_FN_ATTRS256                                                  \
+  __attribute__((__always_inline__, __nodebug__,                               \
+                 __target__("avx512vl,avx512bw,no-evex512"),                   \
+                 __min_vector_width__(256)))
 
 /* Integer compare */
 
diff --git a/clang/lib/Headers/avx512vlcdintrin.h b/clang/lib/Headers/avx512vlcdintrin.h
index cc8b72528d01269..923e2c551a97a86 100644
--- a/clang/lib/Headers/avx512vlcdintrin.h
+++ b/clang/lib/Headers/avx512vlcdintrin.h
@@ -14,9 +14,14 @@
 #define __AVX512VLCDINTRIN_H
 
 /* Define the default attributes for the functions in this file. */
-#define __DEFAULT_FN_ATTRS128 __attribute__((__always_inline__, __nodebug__, __target__("avx512vl,avx512cd"), __min_vector_width__(128)))
-#define __DEFAULT_FN_ATTRS256 __attribute__((__always_inline__, __nodebug__, __target__("avx512vl,avx512cd"), __min_vector_width__(256)))
-
+#define __DEFAULT_FN_ATTRS128                                                  \
+  __attribute__((__always_inline__, __nodebug__,                               \
+                 __target__("avx512vl,avx512cd,no-evex512"),                   \
+                 __min_vector_width__(128)))
+#define __DEFAULT_FN_ATTRS256                                                  \
+  __attribute__((__always_inline__, __nodebug__,                               \
+                 __target__("avx512vl,avx512cd,no-evex512"),                   \
+                 __min_vector_width__(256)))
 
 static __inline__ __m128i __DEFAULT_FN_ATTRS128
 _mm_broadcastmb_epi64 (__mmask8 __A)
diff --git a/clang/lib/Headers/avx512vldqintrin.h b/clang/lib/Headers/avx...
[truncated]

Copy link

github-actions bot commented Nov 5, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

@phoebewang
Copy link
Contributor Author

Ping~

@@ -119,9 +119,13 @@ bool X86TargetInfo::initFeatureMap(
setFeatureEnabled(Features, F, true);

std::vector<std::string> UpdatedFeaturesVec;
bool HasEVEX512 = true;
std::vector<std::string> UpdatedAVX10FeaturesVec;
int HasEVEX512 = -1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use std::optional ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to use enum. It's a 3-status flag. std::optional isn't much useful here.

#define __DEFAULT_FN_ATTRS128 __attribute__((__always_inline__, __nodebug__, __target__("avx2"), __min_vector_width__(128)))
#define __DEFAULT_FN_ATTRS256 \
__attribute__((__always_inline__, __nodebug__, \
__target__("avx2,no-evex512"), __min_vector_width__(256)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the function targeted at avx2 need no-evex512?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have defined parts AVX512 intrinsics with no-evex512 and some of them will call into these AVX2 intrinsics.
Then we are facing a problem that we cannot call them in some cases because we didn't specify no-evex512 for them.

#define __DEFAULT_FN_ATTRS_MMX \
__attribute__((__always_inline__, __nodebug__, __target__("mmx,sse2"), \
__min_vector_width__(64)))
__attribute__((__always_inline__, __nodebug__, \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the function targeted at sse2 need no-evex512?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same reason as above.

HasAVX10 = false;
HasAVX512F = false;
} else if (!HasEVEX512 && Feature == "+evex512") {
} else if (HasEVEX512 != true && Feature == "+evex512") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparing a int value with true and false may be confusing. I suggest "std::optional"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "std::optional" doesn't help here because we need to distinguish the uninitialized status and false too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can check has_value for uninitialized value. But enum looks good to me, too.

UpdatedFeaturesVec.push_back("+evex512");
else if (HasAVX10)
UpdatedFeaturesVec.push_back("-evex512");
llvm::append_range(UpdatedFeaturesVec, UpdatedAVX10FeaturesVec);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean the flags for AVX10 Features will be in the vector UpdatedFeaturesVec 2 times?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, there is a continue in handling avx10* features

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.

@e-kud
Copy link
Contributor

e-kud commented Nov 9, 2023

I'm a little bit confused, What's the expected behavior of +avx10.1-512 -avx10.1-256 in codegen aspect? Should we generate only instructions in the difference of sets? Or do we consider avx10.1-256 as a base of avx10.1-512 and if it is disabled avx10.1-512 can't be enabled?

@phoebewang
Copy link
Contributor Author

I'm a little bit confused, What's the expected behavior of +avx10.1-512 -avx10.1-256 in codegen aspect? Should we generate only instructions in the difference of sets? Or do we consider avx10.1-256 as a base of avx10.1-512 and if it is disabled avx10.1-512 can't be enabled?

-avx10.1-256 works like -avx512f, that says, they are special as a fundamental feature, which will turn off all derivative features for AVX10 and AVX512 respectively.
OTOH, derivative features will only turn off the difference set, e.g., +avx10.3-256 -avx10.2-256 equals to +avx10.1-256.

Copy link
Contributor

@KanRobert KanRobert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Maybe we can add some explanations about why we add attribute no-evex512 for intrinsics in the description of the PR. It's a little tricky.

Copy link
Contributor

@e-kud e-kud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@phoebewang phoebewang merged commit f229ba4 into llvm:main Nov 10, 2023
@phoebewang phoebewang deleted the avx10 branch November 10, 2023 07:21
@phoebewang
Copy link
Contributor Author

Thanks @KanRobert @e-kud

@ronlieb
Copy link
Contributor

ronlieb commented Nov 10, 2023

Hi Phoebe, starting seeing this error on rather old codes after this patch landed .
is there a particular flag you recommend i should compile with to get previous behavior ?

error: always_inline function '_mm_setzero_pd' requires target feature 'evex512', but would be inlined into function '_mm_getexp_pd' that is compiled without support for 'evex512'

@phoebewang
Copy link
Contributor Author

@ronlieb Do you have a reproducer for this problem? I just checked the definition of both intrinsics have no-evex512 already, so shouldn't have such problem.
You can use -mno-evex512 as workaround for the problem anyway.

@ronlieb
Copy link
Contributor

ronlieb commented Nov 10, 2023

thanks, will try workaround for now ...

searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Nov 11, 2023
reverts: [X86][AVX10] Permit AVX512 options/features used together with AVX10 (llvm#71318)
 breaks pgmath build

Change-Id: I14334b22129122d0aa04076dc992d45cf344f6c7
@ronlieb
Copy link
Contributor

ronlieb commented Nov 12, 2023

@ronlieb Do you have a reproducer for this problem? I just checked the definition of both intrinsics have no-evex512 already, so shouldn't have such problem. You can use -mno-evex512 as workaround for the problem anyway.

here is a small reproducer , compile with
clang++ -c fd_log_scalar.cpp -mtune=skylake-avx512 -march=skylake-avx512 -D_CPU=avx512

produces error :
lib/clang/18/include/avx512fintrin.h:5493:41: error: always_inline function '_mm_setzero_pd' requires target feature 'evex512', but would be inlined into function '_mm_getexp_sd' that is compiled without support for 'evex512'
5493 | (__v2df) __B, (__v2df) _mm_setzero_pd(), (__mmask8) -1, _MM_FROUND_CUR_DIRECTION);
| ^
1 error generated.

#include <immintrin.h>

#if !(defined _CPU)
#error: please define _CPU - specific suffix to a function name
#endif

extern "C" double log_d_scalar(double);

double attribute ((noinline)) log_d_scalar(double a_input)
{
__m128d va, vm, ve, vb;
double a, m, e, b, t;
long long mu, eu;

#ifdef AVX512F
va = _mm_set_sd(a_input);
vm = _mm_getmant_sd(va, va, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_nan);
ve = _mm_getexp_sd(va, va);
vb = _mm_getexp_sd(vm, vm);
ve = _mm_sub_sd(ve, vb);
m = _mm_cvtsd_f64(vm);
e = _mm_cvtsd_f64(ve);
#endif
return m + e;
}

@phoebewang
Copy link
Contributor Author

@ronlieb The reproducer can compile successfully in trunk: https://godbolt.org/z/hvKhGq9bq
Are you using a downstream compiler? You can check if the "emmintrin.h" has the same change as main trunk.
You can also check it through pre-compile the code:

$ clang++ -E fd_log_scalar.cpp -D_CPU | grep '_mm_setzero_pd.*{'
static __inline__ __m128d __attribute__((__always_inline__, __nodebug__, __target__("sse2,no-evex512"), __min_vector_width__(128))) _mm_setzero_pd(void) {

Make sure no-evex512 is in the attribute too.

@ronlieb
Copy link
Contributor

ronlieb commented Nov 13, 2023

smallest example and used latest upstream llvm build from this morning
#include <immintrin.h>
void log_d_scalar(double a_input)
{
__m128d va, ve;
ve = _mm_getexp_sd(va, va);
}

clang++ -march=haswell t.cpp

t.cpp:8:10: error: always_inline function '_mm_getexp_sd' requires target feature 'avx512f', but would be inlined into function 'log_d_scalar' that is compiled without support for 'avx512f'

8 |     ve = _mm_getexp_sd(va, va);

  |          ^

1 error generated.

@mstorsjo
Copy link
Member

Hi Phoebe, starting seeing this error on rather old codes after this patch landed . is there a particular flag you recommend i should compile with to get previous behavior ?

error: always_inline function '_mm_setzero_pd' requires target feature 'evex512', but would be inlined into function '_mm_getexp_pd' that is compiled without support for 'evex512'

I also ran into something similar, when compiling Qt; I filed #72106 with a different reproducer.

@phoebewang
Copy link
Contributor Author

Thanks @ronlieb and @mstorsjo. Created #72126 for it.

phoebewang added a commit that referenced this pull request Nov 14, 2023
…72126)

#71318 failed to clear EVEX512 feature for intended intrinsics.

Fixes #72106
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Nov 14, 2023
…lvm#71318)

This patch relaxes the driver logic to permit combinations between
AVX512 and AVX10 options and makes sure we have a unified behavior
between options and features combination.

Here are rules we are following when handle these combinations:
1. evex512 can only be used for avx512xxx options/features. It will be
ignored if used without them;
2. avx512xxx and avx10.xxx are options in two worlds. Avoid to use them
together in any case. It will enable a common super set when they are
used together. E.g., "-mavx512f -mavx10.1-256" euqals "-mavx10.1-512".

Compiler emits warnings when user using combinations like "-mavx512f
-mavx10.1-256" in case they won't get unexpected result silently.

Function target feature attribute follows the same rule now. We have to
add "no-evex512" feature for intrinsics shared between AVX512 and AVX10.
We also add "no-evex512" for early ISAs like AVX etc., because some of
them are called by AVX512 intrinsics.

Change-Id: I3585e8fed47bfca6fedf94b86a72816ae2ee466d
zahiraam pushed a commit to zahiraam/llvm-project that referenced this pull request Nov 20, 2023
…lvm#71318)

This patch relaxes the driver logic to permit combinations between
AVX512 and AVX10 options and makes sure we have a unified behavior
between options and features combination.

Here are rules we are following when handle these combinations:
1. evex512 can only be used for avx512xxx options/features. It will be
ignored if used without them;
2. avx512xxx and avx10.xxx are options in two worlds. Avoid to use them
together in any case. It will enable a common super set when they are
used together. E.g., "-mavx512f -mavx10.1-256" euqals "-mavx10.1-512".

Compiler emits warnings when user using combinations like "-mavx512f
-mavx10.1-256" in case they won't get unexpected result silently.

Function target feature attribute follows the same rule now. We have to
add "no-evex512" feature for intrinsics shared between AVX512 and AVX10.
We also add "no-evex512" for early ISAs like AVX etc., because some of
them are called by AVX512 intrinsics.
zahiraam pushed a commit to zahiraam/llvm-project that referenced this pull request Nov 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:headers Headers provided by Clang, e.g. for intrinsics clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants