Rewrite of ATen code generator #42629

ezyang · 2020-08-05T20:34:14Z

Stack from ghstack:

Rewrite of ATen code generator #42629 Rewrite of ATen code generator

How to approach reviewing this diff:

The new codegen itself lives in tools/codegen. Start with gen.py, then read model.py and them the api/ folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just python -m tools.codegen.gen, something reasonable will happen.
The old codegen is (nearly) entirely deleted; every Python file in aten/src/ATen was deleted except for common_with_cwrap.py, which now permanently finds its home in tools/shared/cwrap_common.py (previously cmake copied the file there), and code_template.py, which now lives in tools/codegen/code_template.py. We remove the copying logic for common_with_cwrap.py.
All of the inputs to the old codegen are deleted.
Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI.
LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port. I kept the CUDA header in ATen/ to avoid having to fix a bunch of headers.

This diff cannot be currently landed as it doesn't reimplement static dispatch.

How do we know that this diff is right? We aimed for byte-for-byte modulo whitespace compatibility with the old generated code. Apply the following patch (to remove static dispatch) to the base version of PyTorch:

diff --git a/aten/src/ATen/function_wrapper.py b/aten/src/ATen/function_wrapper.py
index e26bb3941b..334475212b 100644
--- a/aten/src/ATen/function_wrapper.py
+++ b/aten/src/ATen/function_wrapper.py
@@ -147,7 +147,6 @@ TENSOR_METHOD_DEFINITION = CodeTemplate("""\
 // ${schema_string}
 ${return_type} Tensor::${api_name}(${method_formals}) const {
 #ifdef USE_STATIC_DISPATCH
-    ${static_dispatch_method_body}
 #else
     static auto op = c10::Dispatcher::singleton()
         .findSchemaOrThrow("aten::${operator_name}", "${overload_name}")
@@ -173,7 +172,6 @@ FUNCTION_DEFINITION = CodeTemplate("""\
 // ${schema_string}
 ${return_type} ${api_name}(${formals}) {
 #ifdef USE_STATIC_DISPATCH
-    ${static_dispatch_function_body}
 #else
     static auto op = c10::Dispatcher::singleton()
         .findSchemaOrThrow("aten::${operator_name}", "${overload_name}")

and then we generate the old and new versions and diff them:

 {build-old => build}/aten/src/ATen/BackendSelectRegister.cpp                                 |    0
 {build-old => build}/aten/src/ATen/CPUType.cpp                                               |    0
 {build-old => build}/aten/src/ATen/CUDAType.cpp                                              |    0
 {build-old => build}/aten/src/ATen/CUDAType.h                                                |    0
 build-old/aten/src/ATen/LegacyTHFunctionsCPU.cpp => /dev/null                                | 1712 -------------------
 build-old/aten/src/ATen/LegacyTHFunctionsCPU.h => /dev/null                                  |   67 -
 build-old/aten/src/ATen/LegacyTHFunctionsCUDA.cpp => /dev/null                               | 4176 ---------------------------------------------
 build-old/aten/src/ATen/LegacyTHFunctionsCUDA.h => /dev/null                                 |  111 --
 {build-old => build}/aten/src/ATen/MkldnnCPUType.cpp                                         |    0
 {build-old => build}/aten/src/ATen/NativeFunctions.h                                         |   20 +-
 {build-old => build}/aten/src/ATen/QuantizedCPUType.cpp                                      |    0
 {build-old => build}/aten/src/ATen/QuantizedCUDAType.cpp                                     |    0
 {build-old => build}/aten/src/ATen/QuantizedCUDAType.h                                       |    0
 {build-old => build}/aten/src/ATen/SparseCPUType.cpp                                         |    0
 {build-old => build}/aten/src/ATen/SparseCUDAType.cpp                                        |    0
 {build-old => build}/aten/src/ATen/SparseCUDAType.h                                          |    0
 {build-old => build}/aten/src/ATen/TypeDefault.cpp                                           |    0
 {build-old => build}/aten/src/ATen/core/ATenOpList.cpp                                       |    0

The only diff is this:

 diff --git a/build-old/aten/src/ATen/NativeFunctions.h b/build-new/aten/src/ATen/NativeFunctions.h
index a0463dc80d..3808d27824 100644
--- a/build-old/aten/src/ATen/NativeFunctions.h
+++ b/build-new/aten/src/ATen/NativeFunctions.h
@@ -116,15 +116,15 @@ CAFFE2_API Tensor avg_pool1d(const Tensor & self, IntArrayRef kernel_size, IntAr
 CAFFE2_API Tensor adaptive_avg_pool1d(const Tensor & self, IntArrayRef output_size);
 CAFFE2_API std::tuple<Tensor,Tensor> adaptive_max_pool1d(const Tensor & self, IntArrayRef output_size);
 CAFFE2_API Tensor add(const Tensor & self, const Tensor & other, Scalar alpha=1);
-CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor add_sparse(const Tensor & self, const Tensor & other, Scalar alpha=1);
+CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_(Tensor & self, const Tensor & other, Scalar alpha=1);
-CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_sparse_(Tensor & self, const Tensor & other, Scalar alpha=1);
+CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
-CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
+CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor add_relu(const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_relu_(Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_relu_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
@@ -639,15 +639,15 @@ CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indi
 CAFFE2_API std::tuple<Tensor,Tensor> mode(const Tensor & self, Dimname dim, bool keepdim=false);
 CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indices, const Tensor & self, Dimname dim, bool keepdim=false);
 CAFFE2_API Tensor mul(const Tensor & self, const Tensor & other);
-CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor mul_sparse(const Tensor & self, const Tensor & other);
+CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_(Tensor & self, const Tensor & other);
-CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_sparse_(Tensor & self, const Tensor & other);
+CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_out(Tensor & out, const Tensor & self, const Tensor & other);
-CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other);
+CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor mul(const Tensor & self, Scalar other);
 CAFFE2_API Tensor & mul_(Tensor & self, Scalar other);
 CAFFE2_API Tensor mv(const Tensor & self, const Tensor & vec);
@@ -793,8 +793,8 @@ CAFFE2_API Tensor & silu_(Tensor & self);
 CAFFE2_API Tensor & silu_out(Tensor & out, const Tensor & self);
 CAFFE2_API Tensor silu_backward(const Tensor & grad_output, const Tensor & self);
 CAFFE2_API Tensor sigmoid(const Tensor & self);
-CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self);
 CAFFE2_API Tensor sigmoid_quantized_cpu(const Tensor & self);
+CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self);
 CAFFE2_API Tensor & sigmoid_(Tensor & self);
 CAFFE2_API Tensor & mkldnn_sigmoid_(Tensor & self);
 CAFFE2_API Tensor & sigmoid_out(Tensor & out, const Tensor & self);
@@ -1008,17 +1008,17 @@ CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, bool kee
 CAFFE2_API Tensor nuclear_norm(const Tensor & self, IntArrayRef dim, bool keepdim=false);
 CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, IntArrayRef dim, bool keepdim=false);
 CAFFE2_API Tensor clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
+CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
 CAFFE2_API Tensor mkldnn_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
 CAFFE2_API Tensor quantized_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
-CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
 CAFFE2_API Tensor & resize_as_(Tensor & self, const Tensor & the_template, c10::optional<MemoryFormat> memory_format=c10::nullopt);
 CAFFE2_API Tensor & pow_out(Tensor & out, const Tensor & self, Scalar exponent);
 CAFFE2_API Tensor & pow_out_sparse_scalar(Tensor & out, const Tensor & self, Scalar exponent);
 CAFFE2_API Tensor pow(const Tensor & self, Scalar exponent);
 CAFFE2_API Tensor pow_sparse_scalar(const Tensor & self, Scalar exponent);
 CAFFE2_API Tensor & zero_(Tensor & self);
-CAFFE2_API Tensor & mkldnn_zero_(Tensor & self);
 CAFFE2_API Tensor & zero_sparse_(Tensor & self);
+CAFFE2_API Tensor & mkldnn_zero_(Tensor & self);
 CAFFE2_API Tensor & sub_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & sub_out_sparse(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor sub(const Tensor & self, const Tensor & other, Scalar alpha=1);
@@ -1053,8 +1053,8 @@ CAFFE2_API Tensor & sparse_resize_(Tensor & self, IntArrayRef size, int64_t spar
 CAFFE2_API Tensor & sparse_resize_and_clear_(Tensor & self, IntArrayRef size, int64_t sparse_dim, int64_t dense_dim);
 CAFFE2_API Tensor sparse_mask_cpu(const Tensor & self, const Tensor & mask);
 CAFFE2_API Tensor sparse_mask_cuda(const Tensor & self, const Tensor & mask);
-CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self);
 CAFFE2_API Tensor sparse_to_dense(const Tensor & self);
+CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self);
 CAFFE2_API Tensor to_dense_backward(const Tensor & grad, const Tensor & input);
 CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self);
 CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self);

These are just wobbles in the order of the declarations; I couldn't be bothered to figure out exactly how the old codegen did the ordering.

Signed-off-by: Edward Z. Yang [email protected]

Differential Revision: D23183978

There is a single literate Python file. Start at the top and start reading from there. [ci skip] Signed-off-by: Edward Z. Yang <[email protected]> [ghstack-poisoned]

There is a single literate Python file. Start at the top and start reading from there. [ci skip] Signed-off-by: Edward Z. Yang <[email protected]> ghstack-source-id: 504b1581e235302c7d9ac172dd34adee4bc0c29e Pull Request resolved: #42629

There is a single literate Python file. Start at the top and start reading from there. This file currently produces a 100% compatible CPUType.h definition. You can tell by running: ``` python aten/src/ATen/new_gen.py | git diff --word-diff --no-index - build/aten/src/ATen/CPUType.h ``` [ci skip] Signed-off-by: Edward Z. Yang <[email protected]> [ghstack-poisoned]

There is a single literate Python file. Start at the top and start reading from there. This file currently produces a 100% compatible CPUType.h definition. You can tell by running: ``` python aten/src/ATen/new_gen.py | git diff --word-diff --no-index - build/aten/src/ATen/CPUType.h ``` [ci skip] Signed-off-by: Edward Z. Yang <[email protected]> ghstack-source-id: 33ef10f23cacbfa299304e18c8ff132910950a6c Pull Request resolved: #42629

dr-ci · 2020-08-05T22:45:55Z

💊 CI failures summary and remediations

As of commit 01afc1b (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 144 times.

smessmer

This is much more structured and easier to understand than the old codegen. Thanks a lot for doing this.

smessmer · 2020-08-05T21:22:23Z

aten/src/ATen/new_gen.py

+# You can see some of the overall design patterns for how we setup
+# dataclasses in this class, but we will defer a complete discussion
+# of this at FunctionSchema.
+@dataclass(frozen=True)


Can't give enough heart emojies for making this immutable

smessmer · 2020-08-05T21:23:38Z

aten/src/ATen/new_gen.py

+    # Corresponds to the 'use_c10_dispatcher' field.  Historically,
+    # this field could take several possible strings, but right
+    # now you can have it in any color you like, as long as it's 'full'
+    use_c10_dispatcher_full: bool


nit: Maybe make it an enum with two values instead? Mypy should make that type-safe and it would be closer to the native_functions.yaml representation.

smessmer · 2020-08-05T21:26:17Z

aten/src/ATen/new_gen.py

+
+    # Distinguish between a missing dispatch dict (historically, this
+    # means to register a catch-all kernel) and a present but empty
+    # dispatch dict (this means register nothing; arguably, this should


oh I didn't know empty dispatch dict was a thing. Yes, this should absolutely subsume manual_kernel_registrations after @ailzhang 's change landed and manual_kernel_registrations actually does what you describe above.

smessmer · 2020-08-05T21:28:31Z

aten/src/ATen/new_gen.py

+        func = FunctionSchema.parse(funcs)
+
+        use_c10_dispatcher = e.get('use_c10_dispatcher')
+        assert use_c10_dispatcher is None or use_c10_dispatcher == 'full', \


technically, use_c10_dispatcher had two possible values ['full', 'with_codegenerated_unboxing_wrapper'] and the second one was the default. I think there might be a phase where we can change the default to 'full' but still need to opt-out a few using 'with_codegenerated_unboxing_wrapper' by specifying that key manually in native_functions.yaml, so I'd keep that representation and not just make it a boolean.

smessmer · 2020-08-05T21:30:11Z

aten/src/ATen/new_gen.py

+        variants_s = e.get('variants', 'function')
+        assert isinstance(variants_s, str)
+        variants: Set[Variant] = set()
+        for v in variants_s.split(', '):


Write a Variant.parse to factor this out?

smessmer · 2020-08-05T21:43:13Z

aten/src/ATen/new_gen.py

+    is_write: bool
+
+    @staticmethod
+    def parse(ann: str) -> 'Annotation':


I think I remember reading something about mypy not requiring string-wrapping for forward declared types anymore...but I might be mistaken.

mypy doesn't require it, but Python does D:

smessmer · 2020-08-06T19:27:43Z

aten/src/ATen/new_gen.py

+        return r
+
+    @property
+    def is_write(self) -> bool:


is_mutable ?

smessmer · 2020-08-06T19:29:45Z

aten/src/ATen/new_gen.py

+def parse_arguments(args: str) -> Tuple[Sequence[Argument], Sequence[Argument], Sequence[Argument]]:
+    """
+    Input: 'int x, int y, int z'
+    Output: positional args, kwarg only args


you're missing docs for the third output

smessmer · 2020-08-06T19:30:36Z

aten/src/ATen/new_gen.py

+    out_arguments: List[Argument] = []
+    arguments_acc = arguments
+
+    # TODO: Use a real parser here; this will get bamboozled


I think a lot of the code in here could benefit from a real parser. There seem to be a few libraries for python parser combinators, they would allow a concise syntax and might be useful.

smessmer · 2020-08-06T19:32:23Z

aten/src/ATen/new_gen.py

+# TODO: TensorOptions argument detection
+# TODO: Extra enforcement of inplace functions having mutable self
+
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #


I know this is a prototype, but for the final version, I would propose splitting generation into a separate file from the data structure, probably a separate file for each generated artifact.

I agree, having multiple files will be good.

bhosmer · 2020-08-07T18:51:21Z

aten/src/ATen/new_gen.py

+    # dispatch dict (this means register nothing; arguably, this should
+    # subsume manual_kernel_registration).
+    #
+    # TODO: str key could be replaced with more explicit enum


This is an interesting one - it would be great to drive both codegen and c++ defs off the same source of truth, especially if we want to use the more complicated relationships we're contemplating, like aliases. (And unless we do unify the source of truth, a more semantic datatype will add change-tracking overhead in any case.)

bhosmer · 2020-08-07T18:52:11Z

aten/src/ATen/new_gen.py

+    # TODO: str key could be replaced with more explicit enum
+    dispatch: Optional[Dict[str, str]]
+
+    # The location in the YAML file were this native function entry was


nit: were

I was parsing subjunctive until I hit "defined" 😬

There is a single literate Python file. Start at the top and start reading from there. This file currently produces a 100% compatible CPUType.h definition. You can tell by running: ``` python aten/src/ATen/new_gen.py | git diff --word-diff --no-index - build/aten/src/ATen/CPUType.h ``` [ci skip] Signed-off-by: Edward Z. Yang <[email protected]> [ghstack-poisoned]

There is a single literate Python file. Start at the top and start reading from there. This file currently produces a 100% compatible CPUType.h definition. You can tell by running: ``` python aten/src/ATen/new_gen.py | git diff --word-diff --no-index - build/aten/src/ATen/CPUType.h ``` [ci skip] Signed-off-by: Edward Z. Yang <[email protected]> ghstack-source-id: 43b460f7e16482c4143fdf16b0324874fe816dc6 Pull Request resolved: #42629

ezyang · 2020-08-07T19:25:30Z

aten/src/ATen/new_gen.py

+# I'm not really sure how to structure this logic yet, but here is a
+# sketch.  This function is ONLY correct for CPUType.h at the moment;
+# I bet I am going to need another parameter before I'm done
+def cpp_type(t: Type, *, mutable: bool, argument: bool, legacy_optional: bool) -> str:


@bhosmer suggests making this an lvalue versus rvalue distinction (instead of argument)

@bhosmer On further reflection, I'm not sure lvalue v rvalue distinction really makes sense here. lvalue and rvalue refers to expressions, but the computation here involves types!

ezyang · 2020-08-07T19:28:50Z

aten/src/ATen/new_gen.py

+            assert False, f"unsupported type: {t}"
+    elif isinstance(t, OptionalType):
+        # TODO: these arguments are smoothed over by the hacky wrapper
+        if argument and legacy_optional and str(t.elem) == 'Tensor':


bhosmer suggests factoring out the legacy_optional logic, perhaps into a wrapper

bhosmer · 2020-08-08T04:56:33Z

aten/src/ATen/new_gen.py

+        #     should have a signature equivalent to its pure variant,
+        #     but just with extra kwargs for the output elements.  This
+        #     is difficult to actually check for and historically
+        #     we only do this check in tools/


This is the kind of entropy reduction we'd harvest by merging (or sharing utilities across) ATen and tools/ codegen: there's a bunch of logic over there that does semantic checking over the same interpretations of these definitions. Ideally we'd eventually merge that into what you're doing here and use that canonical model everywhere.

cc @ljk53 per our recent convos around codegen

bhosmer · 2020-08-08T05:37:15Z

aten/src/ATen/new_gen.py

+
+# A custom loader for YAML to let us also keep track of line numbers
+# of each entry in the YAML file
+class LineLoader(Loader):


This is gold. If the final version of this could save location info into Declarations.yaml it could be used in the tools/ codegen for the same usability bump (although at least some of that code is pretty careful to hand-introduce some context - op names mostly IIRC - into its errors, so it's not awful. But this would be better).

bhosmer · 2020-08-08T06:32:00Z

aten/src/ATen/new_gen.py

+        cpp_args.extend(map(format_arg, f.func.out_arguments))
+        cpp_args.extend(map(format_arg, f.func.arguments))
+
+        # Discover TensorOptions


Should we be either more permissive or more proscriptive here? I.e. it might be more user friendly to either accept topt args in any order, or error if they appear out of order, instead of just letting out-of-order args escape the dragnet and cause unexpected behavior downstream.

Aside: I think this is actually a good example of the kind of situation where a codegen pipeline needs a similar attitude to input handling as a compiler frontend - dumb/strict is fine, smart/lenient is fine, but dumb/lenient (i.e. fishing around for patterns and letting misses go by) is an easy way to introduce bad user experience, and IMO a major source of codegen's bad usability rep in general.

The old codegen is much more proscriptive, requiring specific defaults and optionality before it matches. It is really wordy as a result. I could see this code being a bit too relaxed. I think fundamentally whether or not we grab arguments is a combination of type and name. However, the type is a relaxed test: I don't care if the type is optional or not in schema. If it is defaulted I need to generate defaulting code accordingly for tensor options.

Defo gonna change this.

bhosmer · 2020-08-08T08:40:33Z

aten/src/ATen/new_gen.py

+                    i += 1
+
+            rs.append(f"CAFFE2_API {cpp_return} {name}({', '.join(cpp_args)});")
+    return rs


ok I know this is way more than you were looking for but per our chat this afternoon here's a version of compute_function_declarations() that uses some moves that kept me sane when I rewrote gen_python_functions way back when.

ruthlessly pulling functions up to the top level and plumbing all their dependencies into them as explicit params. The original compute_function_declarations() is still pretty small and clear, but reading the dependencies between local functions and closed-over locals is already getting nontrivial... whereas in the flattened version it's super obvious what e.g. formatting an arg depends on, what formatting a tensor options arg depends on (just the function name!) etc.

analogously, yanking any data constants up into global space and throw all available info into their names and comments. Helps visibility and avoiding drift-prone repetition of special cases buried inside functions... and for the many, many of these that are pattern data for workarounds, it demarcates them more clearly as the eruptions of irregularity that they are 😁

being super finicky about missed patterns (here yelling about out-of-sequence tensor options args)

I went ahead and did the whole exercise here just to show how the end result helps (me, anyway) distinguish between the general case and the special cases, as well as what input is being fed into each decision. Well, plus I did it bc I got carried away :P

One last pitch - it might seem like overkill to be so hardcore, now while the code is pretty tight and tractable - but by the time I got to gen_python_functions some of the analogs to compute_function_declarations() were like multi-hundred line behemoths with nested and doubly-nested inner functions that just closed over everything, big dictionaries getting tweaked miles and miles away, the whole deal. I think the flatter and more explicitly plumbed we make things now, the easier it'll be to fight that kind of entropy.

# format an argument for c++ def format_arg(a: Argument, legacy_optional: bool) -> str: # DEFAULTING IS NEW default = f"={cpp_default(a.default, a.type)}" if a.default is not None else "" # TODO: Always NO legacy optional return f"{cpp_type(a.type, mutable=a.is_write, argument=True, legacy_optional=legacy_optional)} {a.name}{default}" # tensor options attributes come in scattered, we recognize and convert to a single TensorOptions arg TOPT_NAMES = ['dtype', 'layout', 'device', 'pin_memory'] TOPT_LEN = len(TOPT_NAMES) # true if i begins the scattered tensor options arguments in args. # note: it's an error for these to be out of order. (TODO what about noncontiguous?) def at_topt_args(i: int, args: List[Argument]) -> bool: in_seq = i <= len(args) - TOPT_LEN and all(args[i+j].name == TOPT_NAMES[j] for j in range(TOPT_LEN)) perm = i <= len(args) - TOPT_LEN and set([a.name for a in args[i:i + TOPT_LEN]]) == set(TOPT_NAMES) if in_seq != perm: raise ValueError(f"TensorOptions arguments must be specified in the following order: {', '.join(TOPT_NAMES)}") return in_seq # TODO these need TensorOptions without a default for some reason TENSOR_OPTIONS_NO_DEFAULT = [ "_cudnn_init_dropout_state", "sparse_coo_tensor.size", "_sparse_coo_tensor_with_dims", "_sparse_coo_tensor_with_dims_and_tensors"] # TODO these need TensorOptions defaulted to (dtype) long TENSOR_OPTIONS_DEFAULT_LONG = ["tril_indices", "triu_indices"] # format TensorOptions arg. Handle some special cases by name def format_topt_arg(f: NativeFunction): if str(f.func.name) in TENSOR_OPTIONS_NO_DEFAULT: # I think this is a bug in the original return 'const TensorOptions & options' elif str(f.func.name) in TENSOR_OPTIONS_DEFAULT_LONG: return 'const TensorOptions & options=at::kLong' else: return 'const TensorOptions & options={}' # MODIFIED # compute all c++ function declarations def compute_function_declarations() -> List[str]: rs: List[str] = [] for f in native_functions: with context(f'in {f.loc}:\n {f.func}'): if f.manual_kernel_registration: continue if Variant.function not in f.variants: continue # TODO: clear up naming cpp_return = cpp_type_return(f.func.returns) name = str(f.func.name.name) if f.func.is_out_fn(): name += '_out' cpp_args: List[str] = [] cpp_args.extend(map(lambda a: format_arg(a, not f.use_c10_dispatcher_full), f.func.out_arguments)) cpp_args.extend(map(lambda a: format_arg(a, not f.use_c10_dispatcher_full), f.func.arguments)) # Discover TensorOptions kwargs = list(f.func.kwarg_only_arguments) # short name i = 0 while i < len(kwargs): if at_topt_args(i, kwargs): cpp_args.append(format_topt_arg(f)) i += len(TOPT_NAMES) else: cpp_args.append(format_arg(kwargs[i], not f.use_c10_dispatcher_full)) i += 1 rs.append(f"CAFFE2_API {cpp_return} {name}({', '.join(cpp_args)});") return rs

Yes, pulling things up seems appropriate. One challenge is I need some sort of top level organizational principle (you toss things as local definitions so you don't have to worry about this.)

For ordering I generally try to stick to in-order, with comment headers to replicate the visual effect of nesting where it helps. So e.g. the code block above might have a big

# --------- compute_function_declarations and helpers ----------- # # ... #

at the top, and maybe a smaller one above the stuff about TensorOptions args.

You're right that it forgoes the natural structuring effect of nested functions, which is a bummer. But the readability payoff of explicit dependencies is definitely worth I think

There is a single literate Python file. Start at the top and start reading from there. This file currently produces a 100% compatible CPUType.h definition. You can tell by running: ``` python aten/src/ATen/new_gen.py | git diff --word-diff --no-index - build/aten/src/ATen/CPUType.h ``` [ci skip] Signed-off-by: Edward Z. Yang <[email protected]> [ghstack-poisoned]

There is a single literate Python file. Start at the top and start reading from there. This file currently produces a 100% compatible CPUType.h definition. You can tell by running: ``` python aten/src/ATen/new_gen.py | git diff --word-diff --no-index - build/aten/src/ATen/CPUType.h ``` [ci skip] Signed-off-by: Edward Z. Yang <[email protected]> ghstack-source-id: 9142ae6993096d49493be2f296c8dcd95d87b8b4 Pull Request resolved: #42629

Check coverage progress with: ``` python -m tools.codegen.gen && git diff --no-index --compact-summary build/aten/src/ATen{_new,}/ ``` [ci skip] Signed-off-by: Edward Z. Yang <[email protected]> [ghstack-poisoned]

Check coverage progress with: ``` python -m tools.codegen.gen && git diff --no-index --compact-summary build/aten/src/ATen{_new,}/ ``` [ci skip] Signed-off-by: Edward Z. Yang <[email protected]> ghstack-source-id: 934530d599066d93f6844aded8e13a13e997ea69 Pull Request resolved: #42629

Check coverage progress with: ``` python -m tools.codegen.gen && git diff --no-index --compact-summary build/aten/src/ATen{_new,}/ ``` [ci skip] Signed-off-by: Edward Z. Yang <[email protected]> [ghstack-poisoned]

Check coverage progress with: ``` python -m tools.codegen.gen && git diff --no-index --compact-summary build/aten/src/ATen{_new,}/ ``` [ci skip] Signed-off-by: Edward Z. Yang <[email protected]> ghstack-source-id: 6e60e097ee2e94c57ccc626555f30726ea583fab Pull Request resolved: #42629

Check coverage progress with: ``` python -m tools.codegen.gen && git diff --no-index --compact-summary build/aten/src/ATen{_new,}/ ``` [ci skip] Signed-off-by: Edward Z. Yang <[email protected]> [ghstack-poisoned]

How to approach reviewing this diff: - The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen. - The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`. - All of the inputs to the old codegen are deleted. - Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI. - LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port. I kept the CUDA header in ATen/ to avoid having to fix a bunch of headers. This diff cannot be currently landed as it doesn't reimplement static dispatch. How do we know that this diff is right? We aimed for byte-for-byte modulo whitespace compatibility with the old generated code. Apply the following patch (to remove static dispatch) to the base version of PyTorch: ``` diff --git a/aten/src/ATen/function_wrapper.py b/aten/src/ATen/function_wrapper.py index e26bb3941b..334475212b 100644 --- a/aten/src/ATen/function_wrapper.py +++ b/aten/src/ATen/function_wrapper.py @@ -147,7 +147,6 @@ TENSOR_METHOD_DEFINITION = CodeTemplate("""\ // ${schema_string} ${return_type} Tensor::${api_name}(${method_formals}) const { #ifdef USE_STATIC_DISPATCH - ${static_dispatch_method_body} #else static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::${operator_name}", "${overload_name}") @@ -173,7 +172,6 @@ FUNCTION_DEFINITION = CodeTemplate("""\ // ${schema_string} ${return_type} ${api_name}(${formals}) { #ifdef USE_STATIC_DISPATCH - ${static_dispatch_function_body} #else static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::${operator_name}", "${overload_name}") ``` and then we generate the old and new versions and diff them: ``` {build-old => build}/aten/src/ATen/BackendSelectRegister.cpp | 0 {build-old => build}/aten/src/ATen/CPUType.cpp | 0 {build-old => build}/aten/src/ATen/CUDAType.cpp | 0 {build-old => build}/aten/src/ATen/CUDAType.h | 0 build-old/aten/src/ATen/LegacyTHFunctionsCPU.cpp => /dev/null | 1712 ------------------- build-old/aten/src/ATen/LegacyTHFunctionsCPU.h => /dev/null | 67 - build-old/aten/src/ATen/LegacyTHFunctionsCUDA.cpp => /dev/null | 4176 --------------------------------------------- build-old/aten/src/ATen/LegacyTHFunctionsCUDA.h => /dev/null | 111 -- {build-old => build}/aten/src/ATen/MkldnnCPUType.cpp | 0 {build-old => build}/aten/src/ATen/NativeFunctions.h | 20 +- {build-old => build}/aten/src/ATen/QuantizedCPUType.cpp | 0 {build-old => build}/aten/src/ATen/QuantizedCUDAType.cpp | 0 {build-old => build}/aten/src/ATen/QuantizedCUDAType.h | 0 {build-old => build}/aten/src/ATen/SparseCPUType.cpp | 0 {build-old => build}/aten/src/ATen/SparseCUDAType.cpp | 0 {build-old => build}/aten/src/ATen/SparseCUDAType.h | 0 {build-old => build}/aten/src/ATen/TypeDefault.cpp | 0 {build-old => build}/aten/src/ATen/core/ATenOpList.cpp | 0 ``` The only diff is this: ``` diff --git a/build-old/aten/src/ATen/NativeFunctions.h b/build-new/aten/src/ATen/NativeFunctions.h index a0463dc80d..3808d27824 100644 --- a/build-old/aten/src/ATen/NativeFunctions.h +++ b/build-new/aten/src/ATen/NativeFunctions.h @@ -116,15 +116,15 @@ CAFFE2_API Tensor avg_pool1d(const Tensor & self, IntArrayRef kernel_size, IntAr CAFFE2_API Tensor adaptive_avg_pool1d(const Tensor & self, IntArrayRef output_size); CAFFE2_API std::tuple<Tensor,Tensor> adaptive_max_pool1d(const Tensor & self, IntArrayRef output_size); CAFFE2_API Tensor add(const Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor add_sparse(const Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_(Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_sparse_(Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor add_relu(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_relu_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_relu_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); @@ -639,15 +639,15 @@ CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indi CAFFE2_API std::tuple<Tensor,Tensor> mode(const Tensor & self, Dimname dim, bool keepdim=false); CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indices, const Tensor & self, Dimname dim, bool keepdim=false); CAFFE2_API Tensor mul(const Tensor & self, const Tensor & other); -CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other); CAFFE2_API Tensor mul_sparse(const Tensor & self, const Tensor & other); +CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_(Tensor & self, const Tensor & other); -CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_sparse_(Tensor & self, const Tensor & other); +CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out(Tensor & out, const Tensor & self, const Tensor & other); -CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other); +CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor mul(const Tensor & self, Scalar other); CAFFE2_API Tensor & mul_(Tensor & self, Scalar other); CAFFE2_API Tensor mv(const Tensor & self, const Tensor & vec); @@ -793,8 +793,8 @@ CAFFE2_API Tensor & silu_(Tensor & self); CAFFE2_API Tensor & silu_out(Tensor & out, const Tensor & self); CAFFE2_API Tensor silu_backward(const Tensor & grad_output, const Tensor & self); CAFFE2_API Tensor sigmoid(const Tensor & self); -CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self); CAFFE2_API Tensor sigmoid_quantized_cpu(const Tensor & self); +CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self); CAFFE2_API Tensor & sigmoid_(Tensor & self); CAFFE2_API Tensor & mkldnn_sigmoid_(Tensor & self); CAFFE2_API Tensor & sigmoid_out(Tensor & out, const Tensor & self); @@ -1008,17 +1008,17 @@ CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, bool kee CAFFE2_API Tensor nuclear_norm(const Tensor & self, IntArrayRef dim, bool keepdim=false); CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, IntArrayRef dim, bool keepdim=false); CAFFE2_API Tensor clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); +CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor mkldnn_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor quantized_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); -CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor & resize_as_(Tensor & self, const Tensor & the_template, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor & pow_out(Tensor & out, const Tensor & self, Scalar exponent); CAFFE2_API Tensor & pow_out_sparse_scalar(Tensor & out, const Tensor & self, Scalar exponent); CAFFE2_API Tensor pow(const Tensor & self, Scalar exponent); CAFFE2_API Tensor pow_sparse_scalar(const Tensor & self, Scalar exponent); CAFFE2_API Tensor & zero_(Tensor & self); -CAFFE2_API Tensor & mkldnn_zero_(Tensor & self); CAFFE2_API Tensor & zero_sparse_(Tensor & self); +CAFFE2_API Tensor & mkldnn_zero_(Tensor & self); CAFFE2_API Tensor & sub_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & sub_out_sparse(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor sub(const Tensor & self, const Tensor & other, Scalar alpha=1); @@ -1053,8 +1053,8 @@ CAFFE2_API Tensor & sparse_resize_(Tensor & self, IntArrayRef size, int64_t spar CAFFE2_API Tensor & sparse_resize_and_clear_(Tensor & self, IntArrayRef size, int64_t sparse_dim, int64_t dense_dim); CAFFE2_API Tensor sparse_mask_cpu(const Tensor & self, const Tensor & mask); CAFFE2_API Tensor sparse_mask_cuda(const Tensor & self, const Tensor & mask); -CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self); CAFFE2_API Tensor sparse_to_dense(const Tensor & self); +CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self); CAFFE2_API Tensor to_dense_backward(const Tensor & grad, const Tensor & input); CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self); CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self); ``` These are just wobbles in the order of the declarations; I couldn't be bothered to figure out exactly how the old codegen did the ordering. Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: [D23183978](https://our.internmc.facebook.com/intern/diff/D23183978) [ghstack-poisoned]

How to approach reviewing this diff: - The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen. - The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`. - All of the inputs to the old codegen are deleted. - Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI. - LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port. Signed-off-by: Edward Z. Yang <[email protected]> ghstack-source-id: 0c4c53b1264bff372300febe4bb9b79f5c83a212 Pull Request resolved: #42629

How to approach reviewing this diff: - The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen. - The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`. - All of the inputs to the old codegen are deleted. - Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI. - LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port. I kept the CUDA header in ATen/ to avoid having to fix a bunch of headers. This diff cannot be currently landed as it doesn't reimplement static dispatch. How do we know that this diff is right? We aimed for byte-for-byte modulo whitespace compatibility with the old generated code. Apply the following patch (to remove static dispatch) to the base version of PyTorch: ``` diff --git a/aten/src/ATen/function_wrapper.py b/aten/src/ATen/function_wrapper.py index e26bb3941b..334475212b 100644 --- a/aten/src/ATen/function_wrapper.py +++ b/aten/src/ATen/function_wrapper.py @@ -147,7 +147,6 @@ TENSOR_METHOD_DEFINITION = CodeTemplate("""\ // ${schema_string} ${return_type} Tensor::${api_name}(${method_formals}) const { #ifdef USE_STATIC_DISPATCH - ${static_dispatch_method_body} #else static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::${operator_name}", "${overload_name}") @@ -173,7 +172,6 @@ FUNCTION_DEFINITION = CodeTemplate("""\ // ${schema_string} ${return_type} ${api_name}(${formals}) { #ifdef USE_STATIC_DISPATCH - ${static_dispatch_function_body} #else static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::${operator_name}", "${overload_name}") ``` and then we generate the old and new versions and diff them: ``` {build-old => build}/aten/src/ATen/BackendSelectRegister.cpp | 0 {build-old => build}/aten/src/ATen/CPUType.cpp | 0 {build-old => build}/aten/src/ATen/CUDAType.cpp | 0 {build-old => build}/aten/src/ATen/CUDAType.h | 0 build-old/aten/src/ATen/LegacyTHFunctionsCPU.cpp => /dev/null | 1712 ------------------- build-old/aten/src/ATen/LegacyTHFunctionsCPU.h => /dev/null | 67 - build-old/aten/src/ATen/LegacyTHFunctionsCUDA.cpp => /dev/null | 4176 --------------------------------------------- build-old/aten/src/ATen/LegacyTHFunctionsCUDA.h => /dev/null | 111 -- {build-old => build}/aten/src/ATen/MkldnnCPUType.cpp | 0 {build-old => build}/aten/src/ATen/NativeFunctions.h | 20 +- {build-old => build}/aten/src/ATen/QuantizedCPUType.cpp | 0 {build-old => build}/aten/src/ATen/QuantizedCUDAType.cpp | 0 {build-old => build}/aten/src/ATen/QuantizedCUDAType.h | 0 {build-old => build}/aten/src/ATen/SparseCPUType.cpp | 0 {build-old => build}/aten/src/ATen/SparseCUDAType.cpp | 0 {build-old => build}/aten/src/ATen/SparseCUDAType.h | 0 {build-old => build}/aten/src/ATen/TypeDefault.cpp | 0 {build-old => build}/aten/src/ATen/core/ATenOpList.cpp | 0 ``` The only diff is this: ``` diff --git a/build-old/aten/src/ATen/NativeFunctions.h b/build-new/aten/src/ATen/NativeFunctions.h index a0463dc80d..3808d27824 100644 --- a/build-old/aten/src/ATen/NativeFunctions.h +++ b/build-new/aten/src/ATen/NativeFunctions.h @@ -116,15 +116,15 @@ CAFFE2_API Tensor avg_pool1d(const Tensor & self, IntArrayRef kernel_size, IntAr CAFFE2_API Tensor adaptive_avg_pool1d(const Tensor & self, IntArrayRef output_size); CAFFE2_API std::tuple<Tensor,Tensor> adaptive_max_pool1d(const Tensor & self, IntArrayRef output_size); CAFFE2_API Tensor add(const Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor add_sparse(const Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_(Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_sparse_(Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor add_relu(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_relu_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_relu_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); @@ -639,15 +639,15 @@ CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indi CAFFE2_API std::tuple<Tensor,Tensor> mode(const Tensor & self, Dimname dim, bool keepdim=false); CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indices, const Tensor & self, Dimname dim, bool keepdim=false); CAFFE2_API Tensor mul(const Tensor & self, const Tensor & other); -CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other); CAFFE2_API Tensor mul_sparse(const Tensor & self, const Tensor & other); +CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_(Tensor & self, const Tensor & other); -CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_sparse_(Tensor & self, const Tensor & other); +CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out(Tensor & out, const Tensor & self, const Tensor & other); -CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other); +CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor mul(const Tensor & self, Scalar other); CAFFE2_API Tensor & mul_(Tensor & self, Scalar other); CAFFE2_API Tensor mv(const Tensor & self, const Tensor & vec); @@ -793,8 +793,8 @@ CAFFE2_API Tensor & silu_(Tensor & self); CAFFE2_API Tensor & silu_out(Tensor & out, const Tensor & self); CAFFE2_API Tensor silu_backward(const Tensor & grad_output, const Tensor & self); CAFFE2_API Tensor sigmoid(const Tensor & self); -CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self); CAFFE2_API Tensor sigmoid_quantized_cpu(const Tensor & self); +CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self); CAFFE2_API Tensor & sigmoid_(Tensor & self); CAFFE2_API Tensor & mkldnn_sigmoid_(Tensor & self); CAFFE2_API Tensor & sigmoid_out(Tensor & out, const Tensor & self); @@ -1008,17 +1008,17 @@ CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, bool kee CAFFE2_API Tensor nuclear_norm(const Tensor & self, IntArrayRef dim, bool keepdim=false); CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, IntArrayRef dim, bool keepdim=false); CAFFE2_API Tensor clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); +CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor mkldnn_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor quantized_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); -CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor & resize_as_(Tensor & self, const Tensor & the_template, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor & pow_out(Tensor & out, const Tensor & self, Scalar exponent); CAFFE2_API Tensor & pow_out_sparse_scalar(Tensor & out, const Tensor & self, Scalar exponent); CAFFE2_API Tensor pow(const Tensor & self, Scalar exponent); CAFFE2_API Tensor pow_sparse_scalar(const Tensor & self, Scalar exponent); CAFFE2_API Tensor & zero_(Tensor & self); -CAFFE2_API Tensor & mkldnn_zero_(Tensor & self); CAFFE2_API Tensor & zero_sparse_(Tensor & self); +CAFFE2_API Tensor & mkldnn_zero_(Tensor & self); CAFFE2_API Tensor & sub_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & sub_out_sparse(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor sub(const Tensor & self, const Tensor & other, Scalar alpha=1); @@ -1053,8 +1053,8 @@ CAFFE2_API Tensor & sparse_resize_(Tensor & self, IntArrayRef size, int64_t spar CAFFE2_API Tensor & sparse_resize_and_clear_(Tensor & self, IntArrayRef size, int64_t sparse_dim, int64_t dense_dim); CAFFE2_API Tensor sparse_mask_cpu(const Tensor & self, const Tensor & mask); CAFFE2_API Tensor sparse_mask_cuda(const Tensor & self, const Tensor & mask); -CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self); CAFFE2_API Tensor sparse_to_dense(const Tensor & self); +CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self); CAFFE2_API Tensor to_dense_backward(const Tensor & grad, const Tensor & input); CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self); CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self); ``` These are just wobbles in the order of the declarations; I couldn't be bothered to figure out exactly how the old codegen did the ordering. Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: [D23183978](https://our.internmc.facebook.com/intern/diff/D23183978) [ghstack-poisoned]

How to approach reviewing this diff: - The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen. - The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`. - All of the inputs to the old codegen are deleted. - Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI. - LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port. Signed-off-by: Edward Z. Yang <[email protected]> ghstack-source-id: ce505828e3425dc5e162d94b4a6f0ebeb897c352 Pull Request resolved: #42629

ezyang · 2020-08-25T02:04:46Z

Updated for int[2]? optional support added in #43262

bhosmer

Really nice piece of work. A few ideas inline but nothing really pressing.

tools/codegen/gen.py

bhosmer · 2020-08-25T05:18:54Z

tools/codegen/local.py

+from contextlib import contextmanager
+from typing import Optional, Iterator
+
+# Simple dynamic scoping implementation.  The name "parametrize" comes


I'm gonna go ahead and be no fun and say I wish we could do without this. It's cute and avoids some plumbing but a) it's way trickier than ideally I'd want a codegen script to be and b) anyway I like plumbing, it makes code look more like what it does and reduces the indirection needed to figure out what's going on.

Imagining what this could turn into over time is probably coloring my reaction, it feels like it makes innocuously adding complexity to the global state uncomfortably easy.

You have a better feel for what it's saved on the writing side, so if you feel like the code-writer-vs-reader cost/benefit works, I wouldn't push too hard on it. But if so, if you can think of a way to make it harder for a bunch of hack_* flags to quietly accumulate in here, even maybe just a finger-wagging comment, I think that'd be worth doing.

tools/codegen/gen.py

tools/codegen/api/types.py

How to approach reviewing this diff: - The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen. - The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`. - All of the inputs to the old codegen are deleted. - Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI. - LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port. I kept the CUDA header in ATen/ to avoid having to fix a bunch of headers. This diff cannot be currently landed as it doesn't reimplement static dispatch. How do we know that this diff is right? We aimed for byte-for-byte modulo whitespace compatibility with the old generated code. Apply the following patch (to remove static dispatch) to the base version of PyTorch: ``` diff --git a/aten/src/ATen/function_wrapper.py b/aten/src/ATen/function_wrapper.py index e26bb3941b..334475212b 100644 --- a/aten/src/ATen/function_wrapper.py +++ b/aten/src/ATen/function_wrapper.py @@ -147,7 +147,6 @@ TENSOR_METHOD_DEFINITION = CodeTemplate("""\ // ${schema_string} ${return_type} Tensor::${api_name}(${method_formals}) const { #ifdef USE_STATIC_DISPATCH - ${static_dispatch_method_body} #else static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::${operator_name}", "${overload_name}") @@ -173,7 +172,6 @@ FUNCTION_DEFINITION = CodeTemplate("""\ // ${schema_string} ${return_type} ${api_name}(${formals}) { #ifdef USE_STATIC_DISPATCH - ${static_dispatch_function_body} #else static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::${operator_name}", "${overload_name}") ``` and then we generate the old and new versions and diff them: ``` {build-old => build}/aten/src/ATen/BackendSelectRegister.cpp | 0 {build-old => build}/aten/src/ATen/CPUType.cpp | 0 {build-old => build}/aten/src/ATen/CUDAType.cpp | 0 {build-old => build}/aten/src/ATen/CUDAType.h | 0 build-old/aten/src/ATen/LegacyTHFunctionsCPU.cpp => /dev/null | 1712 ------------------- build-old/aten/src/ATen/LegacyTHFunctionsCPU.h => /dev/null | 67 - build-old/aten/src/ATen/LegacyTHFunctionsCUDA.cpp => /dev/null | 4176 --------------------------------------------- build-old/aten/src/ATen/LegacyTHFunctionsCUDA.h => /dev/null | 111 -- {build-old => build}/aten/src/ATen/MkldnnCPUType.cpp | 0 {build-old => build}/aten/src/ATen/NativeFunctions.h | 20 +- {build-old => build}/aten/src/ATen/QuantizedCPUType.cpp | 0 {build-old => build}/aten/src/ATen/QuantizedCUDAType.cpp | 0 {build-old => build}/aten/src/ATen/QuantizedCUDAType.h | 0 {build-old => build}/aten/src/ATen/SparseCPUType.cpp | 0 {build-old => build}/aten/src/ATen/SparseCUDAType.cpp | 0 {build-old => build}/aten/src/ATen/SparseCUDAType.h | 0 {build-old => build}/aten/src/ATen/TypeDefault.cpp | 0 {build-old => build}/aten/src/ATen/core/ATenOpList.cpp | 0 ``` The only diff is this: ``` diff --git a/build-old/aten/src/ATen/NativeFunctions.h b/build-new/aten/src/ATen/NativeFunctions.h index a0463dc80d..3808d27824 100644 --- a/build-old/aten/src/ATen/NativeFunctions.h +++ b/build-new/aten/src/ATen/NativeFunctions.h @@ -116,15 +116,15 @@ CAFFE2_API Tensor avg_pool1d(const Tensor & self, IntArrayRef kernel_size, IntAr CAFFE2_API Tensor adaptive_avg_pool1d(const Tensor & self, IntArrayRef output_size); CAFFE2_API std::tuple<Tensor,Tensor> adaptive_max_pool1d(const Tensor & self, IntArrayRef output_size); CAFFE2_API Tensor add(const Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor add_sparse(const Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_(Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_sparse_(Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor add_relu(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_relu_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_relu_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); @@ -639,15 +639,15 @@ CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indi CAFFE2_API std::tuple<Tensor,Tensor> mode(const Tensor & self, Dimname dim, bool keepdim=false); CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indices, const Tensor & self, Dimname dim, bool keepdim=false); CAFFE2_API Tensor mul(const Tensor & self, const Tensor & other); -CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other); CAFFE2_API Tensor mul_sparse(const Tensor & self, const Tensor & other); +CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_(Tensor & self, const Tensor & other); -CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_sparse_(Tensor & self, const Tensor & other); +CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out(Tensor & out, const Tensor & self, const Tensor & other); -CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other); +CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor mul(const Tensor & self, Scalar other); CAFFE2_API Tensor & mul_(Tensor & self, Scalar other); CAFFE2_API Tensor mv(const Tensor & self, const Tensor & vec); @@ -793,8 +793,8 @@ CAFFE2_API Tensor & silu_(Tensor & self); CAFFE2_API Tensor & silu_out(Tensor & out, const Tensor & self); CAFFE2_API Tensor silu_backward(const Tensor & grad_output, const Tensor & self); CAFFE2_API Tensor sigmoid(const Tensor & self); -CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self); CAFFE2_API Tensor sigmoid_quantized_cpu(const Tensor & self); +CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self); CAFFE2_API Tensor & sigmoid_(Tensor & self); CAFFE2_API Tensor & mkldnn_sigmoid_(Tensor & self); CAFFE2_API Tensor & sigmoid_out(Tensor & out, const Tensor & self); @@ -1008,17 +1008,17 @@ CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, bool kee CAFFE2_API Tensor nuclear_norm(const Tensor & self, IntArrayRef dim, bool keepdim=false); CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, IntArrayRef dim, bool keepdim=false); CAFFE2_API Tensor clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); +CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor mkldnn_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor quantized_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); -CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor & resize_as_(Tensor & self, const Tensor & the_template, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor & pow_out(Tensor & out, const Tensor & self, Scalar exponent); CAFFE2_API Tensor & pow_out_sparse_scalar(Tensor & out, const Tensor & self, Scalar exponent); CAFFE2_API Tensor pow(const Tensor & self, Scalar exponent); CAFFE2_API Tensor pow_sparse_scalar(const Tensor & self, Scalar exponent); CAFFE2_API Tensor & zero_(Tensor & self); -CAFFE2_API Tensor & mkldnn_zero_(Tensor & self); CAFFE2_API Tensor & zero_sparse_(Tensor & self); +CAFFE2_API Tensor & mkldnn_zero_(Tensor & self); CAFFE2_API Tensor & sub_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & sub_out_sparse(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor sub(const Tensor & self, const Tensor & other, Scalar alpha=1); @@ -1053,8 +1053,8 @@ CAFFE2_API Tensor & sparse_resize_(Tensor & self, IntArrayRef size, int64_t spar CAFFE2_API Tensor & sparse_resize_and_clear_(Tensor & self, IntArrayRef size, int64_t sparse_dim, int64_t dense_dim); CAFFE2_API Tensor sparse_mask_cpu(const Tensor & self, const Tensor & mask); CAFFE2_API Tensor sparse_mask_cuda(const Tensor & self, const Tensor & mask); -CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self); CAFFE2_API Tensor sparse_to_dense(const Tensor & self); +CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self); CAFFE2_API Tensor to_dense_backward(const Tensor & grad, const Tensor & input); CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self); CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self); ``` These are just wobbles in the order of the declarations; I couldn't be bothered to figure out exactly how the old codegen did the ordering. Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: [D23183978](https://our.internmc.facebook.com/intern/diff/D23183978) [ghstack-poisoned]

How to approach reviewing this diff: - The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen. - The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`. - All of the inputs to the old codegen are deleted. - Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI. - LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port. Signed-off-by: Edward Z. Yang <[email protected]> ghstack-source-id: bddc3c8107ce200cca339806c6eea6fd368744ed Pull Request resolved: #42629

How to approach reviewing this diff: - The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen. - The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`. - All of the inputs to the old codegen are deleted. - Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI. - LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port. I kept the CUDA header in ATen/ to avoid having to fix a bunch of headers. This diff cannot be currently landed as it doesn't reimplement static dispatch. How do we know that this diff is right? We aimed for byte-for-byte modulo whitespace compatibility with the old generated code. Apply the following patch (to remove static dispatch) to the base version of PyTorch: ``` diff --git a/aten/src/ATen/function_wrapper.py b/aten/src/ATen/function_wrapper.py index e26bb3941b..334475212b 100644 --- a/aten/src/ATen/function_wrapper.py +++ b/aten/src/ATen/function_wrapper.py @@ -147,7 +147,6 @@ TENSOR_METHOD_DEFINITION = CodeTemplate("""\ // ${schema_string} ${return_type} Tensor::${api_name}(${method_formals}) const { #ifdef USE_STATIC_DISPATCH - ${static_dispatch_method_body} #else static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::${operator_name}", "${overload_name}") @@ -173,7 +172,6 @@ FUNCTION_DEFINITION = CodeTemplate("""\ // ${schema_string} ${return_type} ${api_name}(${formals}) { #ifdef USE_STATIC_DISPATCH - ${static_dispatch_function_body} #else static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::${operator_name}", "${overload_name}") ``` and then we generate the old and new versions and diff them: ``` {build-old => build}/aten/src/ATen/BackendSelectRegister.cpp | 0 {build-old => build}/aten/src/ATen/CPUType.cpp | 0 {build-old => build}/aten/src/ATen/CUDAType.cpp | 0 {build-old => build}/aten/src/ATen/CUDAType.h | 0 build-old/aten/src/ATen/LegacyTHFunctionsCPU.cpp => /dev/null | 1712 ------------------- build-old/aten/src/ATen/LegacyTHFunctionsCPU.h => /dev/null | 67 - build-old/aten/src/ATen/LegacyTHFunctionsCUDA.cpp => /dev/null | 4176 --------------------------------------------- build-old/aten/src/ATen/LegacyTHFunctionsCUDA.h => /dev/null | 111 -- {build-old => build}/aten/src/ATen/MkldnnCPUType.cpp | 0 {build-old => build}/aten/src/ATen/NativeFunctions.h | 20 +- {build-old => build}/aten/src/ATen/QuantizedCPUType.cpp | 0 {build-old => build}/aten/src/ATen/QuantizedCUDAType.cpp | 0 {build-old => build}/aten/src/ATen/QuantizedCUDAType.h | 0 {build-old => build}/aten/src/ATen/SparseCPUType.cpp | 0 {build-old => build}/aten/src/ATen/SparseCUDAType.cpp | 0 {build-old => build}/aten/src/ATen/SparseCUDAType.h | 0 {build-old => build}/aten/src/ATen/TypeDefault.cpp | 0 {build-old => build}/aten/src/ATen/core/ATenOpList.cpp | 0 ``` The only diff is this: ``` diff --git a/build-old/aten/src/ATen/NativeFunctions.h b/build-new/aten/src/ATen/NativeFunctions.h index a0463dc80d..3808d27824 100644 --- a/build-old/aten/src/ATen/NativeFunctions.h +++ b/build-new/aten/src/ATen/NativeFunctions.h @@ -116,15 +116,15 @@ CAFFE2_API Tensor avg_pool1d(const Tensor & self, IntArrayRef kernel_size, IntAr CAFFE2_API Tensor adaptive_avg_pool1d(const Tensor & self, IntArrayRef output_size); CAFFE2_API std::tuple<Tensor,Tensor> adaptive_max_pool1d(const Tensor & self, IntArrayRef output_size); CAFFE2_API Tensor add(const Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor add_sparse(const Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_(Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_sparse_(Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor add_relu(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_relu_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_relu_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); @@ -639,15 +639,15 @@ CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indi CAFFE2_API std::tuple<Tensor,Tensor> mode(const Tensor & self, Dimname dim, bool keepdim=false); CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indices, const Tensor & self, Dimname dim, bool keepdim=false); CAFFE2_API Tensor mul(const Tensor & self, const Tensor & other); -CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other); CAFFE2_API Tensor mul_sparse(const Tensor & self, const Tensor & other); +CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_(Tensor & self, const Tensor & other); -CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_sparse_(Tensor & self, const Tensor & other); +CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out(Tensor & out, const Tensor & self, const Tensor & other); -CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other); +CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor mul(const Tensor & self, Scalar other); CAFFE2_API Tensor & mul_(Tensor & self, Scalar other); CAFFE2_API Tensor mv(const Tensor & self, const Tensor & vec); @@ -793,8 +793,8 @@ CAFFE2_API Tensor & silu_(Tensor & self); CAFFE2_API Tensor & silu_out(Tensor & out, const Tensor & self); CAFFE2_API Tensor silu_backward(const Tensor & grad_output, const Tensor & self); CAFFE2_API Tensor sigmoid(const Tensor & self); -CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self); CAFFE2_API Tensor sigmoid_quantized_cpu(const Tensor & self); +CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self); CAFFE2_API Tensor & sigmoid_(Tensor & self); CAFFE2_API Tensor & mkldnn_sigmoid_(Tensor & self); CAFFE2_API Tensor & sigmoid_out(Tensor & out, const Tensor & self); @@ -1008,17 +1008,17 @@ CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, bool kee CAFFE2_API Tensor nuclear_norm(const Tensor & self, IntArrayRef dim, bool keepdim=false); CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, IntArrayRef dim, bool keepdim=false); CAFFE2_API Tensor clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); +CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor mkldnn_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor quantized_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); -CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor & resize_as_(Tensor & self, const Tensor & the_template, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor & pow_out(Tensor & out, const Tensor & self, Scalar exponent); CAFFE2_API Tensor & pow_out_sparse_scalar(Tensor & out, const Tensor & self, Scalar exponent); CAFFE2_API Tensor pow(const Tensor & self, Scalar exponent); CAFFE2_API Tensor pow_sparse_scalar(const Tensor & self, Scalar exponent); CAFFE2_API Tensor & zero_(Tensor & self); -CAFFE2_API Tensor & mkldnn_zero_(Tensor & self); CAFFE2_API Tensor & zero_sparse_(Tensor & self); +CAFFE2_API Tensor & mkldnn_zero_(Tensor & self); CAFFE2_API Tensor & sub_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & sub_out_sparse(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor sub(const Tensor & self, const Tensor & other, Scalar alpha=1); @@ -1053,8 +1053,8 @@ CAFFE2_API Tensor & sparse_resize_(Tensor & self, IntArrayRef size, int64_t spar CAFFE2_API Tensor & sparse_resize_and_clear_(Tensor & self, IntArrayRef size, int64_t sparse_dim, int64_t dense_dim); CAFFE2_API Tensor sparse_mask_cpu(const Tensor & self, const Tensor & mask); CAFFE2_API Tensor sparse_mask_cuda(const Tensor & self, const Tensor & mask); -CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self); CAFFE2_API Tensor sparse_to_dense(const Tensor & self); +CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self); CAFFE2_API Tensor to_dense_backward(const Tensor & grad, const Tensor & input); CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self); CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self); ``` These are just wobbles in the order of the declarations; I couldn't be bothered to figure out exactly how the old codegen did the ordering. Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: [D23183978](https://our.internmc.facebook.com/intern/diff/D23183978) [ghstack-poisoned]

How to approach reviewing this diff: - The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen. - The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`. - All of the inputs to the old codegen are deleted. - Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI. - LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port. Signed-off-by: Edward Z. Yang <[email protected]> ghstack-source-id: 5203f12d2cd98c740e57a5c479e0014121351c65 Pull Request resolved: #42629

How to approach reviewing this diff: - The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen. - The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`. - All of the inputs to the old codegen are deleted. - Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI. - LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port. I kept the CUDA header in ATen/ to avoid having to fix a bunch of headers. This diff cannot be currently landed as it doesn't reimplement static dispatch. How do we know that this diff is right? We aimed for byte-for-byte modulo whitespace compatibility with the old generated code. Apply the following patch (to remove static dispatch) to the base version of PyTorch: ``` diff --git a/aten/src/ATen/function_wrapper.py b/aten/src/ATen/function_wrapper.py index e26bb3941b..334475212b 100644 --- a/aten/src/ATen/function_wrapper.py +++ b/aten/src/ATen/function_wrapper.py @@ -147,7 +147,6 @@ TENSOR_METHOD_DEFINITION = CodeTemplate("""\ // ${schema_string} ${return_type} Tensor::${api_name}(${method_formals}) const { #ifdef USE_STATIC_DISPATCH - ${static_dispatch_method_body} #else static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::${operator_name}", "${overload_name}") @@ -173,7 +172,6 @@ FUNCTION_DEFINITION = CodeTemplate("""\ // ${schema_string} ${return_type} ${api_name}(${formals}) { #ifdef USE_STATIC_DISPATCH - ${static_dispatch_function_body} #else static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::${operator_name}", "${overload_name}") ``` and then we generate the old and new versions and diff them: ``` {build-old => build}/aten/src/ATen/BackendSelectRegister.cpp | 0 {build-old => build}/aten/src/ATen/CPUType.cpp | 0 {build-old => build}/aten/src/ATen/CUDAType.cpp | 0 {build-old => build}/aten/src/ATen/CUDAType.h | 0 build-old/aten/src/ATen/LegacyTHFunctionsCPU.cpp => /dev/null | 1712 ------------------- build-old/aten/src/ATen/LegacyTHFunctionsCPU.h => /dev/null | 67 - build-old/aten/src/ATen/LegacyTHFunctionsCUDA.cpp => /dev/null | 4176 --------------------------------------------- build-old/aten/src/ATen/LegacyTHFunctionsCUDA.h => /dev/null | 111 -- {build-old => build}/aten/src/ATen/MkldnnCPUType.cpp | 0 {build-old => build}/aten/src/ATen/NativeFunctions.h | 20 +- {build-old => build}/aten/src/ATen/QuantizedCPUType.cpp | 0 {build-old => build}/aten/src/ATen/QuantizedCUDAType.cpp | 0 {build-old => build}/aten/src/ATen/QuantizedCUDAType.h | 0 {build-old => build}/aten/src/ATen/SparseCPUType.cpp | 0 {build-old => build}/aten/src/ATen/SparseCUDAType.cpp | 0 {build-old => build}/aten/src/ATen/SparseCUDAType.h | 0 {build-old => build}/aten/src/ATen/TypeDefault.cpp | 0 {build-old => build}/aten/src/ATen/core/ATenOpList.cpp | 0 ``` The only diff is this: ``` diff --git a/build-old/aten/src/ATen/NativeFunctions.h b/build-new/aten/src/ATen/NativeFunctions.h index a0463dc80d..3808d27824 100644 --- a/build-old/aten/src/ATen/NativeFunctions.h +++ b/build-new/aten/src/ATen/NativeFunctions.h @@ -116,15 +116,15 @@ CAFFE2_API Tensor avg_pool1d(const Tensor & self, IntArrayRef kernel_size, IntAr CAFFE2_API Tensor adaptive_avg_pool1d(const Tensor & self, IntArrayRef output_size); CAFFE2_API std::tuple<Tensor,Tensor> adaptive_max_pool1d(const Tensor & self, IntArrayRef output_size); CAFFE2_API Tensor add(const Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor add_sparse(const Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_(Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_sparse_(Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor add_relu(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_relu_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_relu_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); @@ -639,15 +639,15 @@ CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indi CAFFE2_API std::tuple<Tensor,Tensor> mode(const Tensor & self, Dimname dim, bool keepdim=false); CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indices, const Tensor & self, Dimname dim, bool keepdim=false); CAFFE2_API Tensor mul(const Tensor & self, const Tensor & other); -CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other); CAFFE2_API Tensor mul_sparse(const Tensor & self, const Tensor & other); +CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_(Tensor & self, const Tensor & other); -CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_sparse_(Tensor & self, const Tensor & other); +CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out(Tensor & out, const Tensor & self, const Tensor & other); -CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other); +CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor mul(const Tensor & self, Scalar other); CAFFE2_API Tensor & mul_(Tensor & self, Scalar other); CAFFE2_API Tensor mv(const Tensor & self, const Tensor & vec); @@ -793,8 +793,8 @@ CAFFE2_API Tensor & silu_(Tensor & self); CAFFE2_API Tensor & silu_out(Tensor & out, const Tensor & self); CAFFE2_API Tensor silu_backward(const Tensor & grad_output, const Tensor & self); CAFFE2_API Tensor sigmoid(const Tensor & self); -CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self); CAFFE2_API Tensor sigmoid_quantized_cpu(const Tensor & self); +CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self); CAFFE2_API Tensor & sigmoid_(Tensor & self); CAFFE2_API Tensor & mkldnn_sigmoid_(Tensor & self); CAFFE2_API Tensor & sigmoid_out(Tensor & out, const Tensor & self); @@ -1008,17 +1008,17 @@ CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, bool kee CAFFE2_API Tensor nuclear_norm(const Tensor & self, IntArrayRef dim, bool keepdim=false); CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, IntArrayRef dim, bool keepdim=false); CAFFE2_API Tensor clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); +CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor mkldnn_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor quantized_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); -CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor & resize_as_(Tensor & self, const Tensor & the_template, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor & pow_out(Tensor & out, const Tensor & self, Scalar exponent); CAFFE2_API Tensor & pow_out_sparse_scalar(Tensor & out, const Tensor & self, Scalar exponent); CAFFE2_API Tensor pow(const Tensor & self, Scalar exponent); CAFFE2_API Tensor pow_sparse_scalar(const Tensor & self, Scalar exponent); CAFFE2_API Tensor & zero_(Tensor & self); -CAFFE2_API Tensor & mkldnn_zero_(Tensor & self); CAFFE2_API Tensor & zero_sparse_(Tensor & self); +CAFFE2_API Tensor & mkldnn_zero_(Tensor & self); CAFFE2_API Tensor & sub_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & sub_out_sparse(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor sub(const Tensor & self, const Tensor & other, Scalar alpha=1); @@ -1053,8 +1053,8 @@ CAFFE2_API Tensor & sparse_resize_(Tensor & self, IntArrayRef size, int64_t spar CAFFE2_API Tensor & sparse_resize_and_clear_(Tensor & self, IntArrayRef size, int64_t sparse_dim, int64_t dense_dim); CAFFE2_API Tensor sparse_mask_cpu(const Tensor & self, const Tensor & mask); CAFFE2_API Tensor sparse_mask_cuda(const Tensor & self, const Tensor & mask); -CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self); CAFFE2_API Tensor sparse_to_dense(const Tensor & self); +CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self); CAFFE2_API Tensor to_dense_backward(const Tensor & grad, const Tensor & input); CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self); CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self); ``` These are just wobbles in the order of the declarations; I couldn't be bothered to figure out exactly how the old codegen did the ordering. Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: [D23183978](https://our.internmc.facebook.com/intern/diff/D23183978) [ghstack-poisoned]

How to approach reviewing this diff: - The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen. - The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`. - All of the inputs to the old codegen are deleted. - Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI. - LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port. Signed-off-by: Edward Z. Yang <[email protected]> ghstack-source-id: 85e52d4957c407adbd2b48bfe78ad00ce152c701 Pull Request resolved: #42629

How to approach reviewing this diff: - The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen. - The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`. - All of the inputs to the old codegen are deleted. - Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI. - LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port. I kept the CUDA header in ATen/ to avoid having to fix a bunch of headers. This diff cannot be currently landed as it doesn't reimplement static dispatch. How do we know that this diff is right? We aimed for byte-for-byte modulo whitespace compatibility with the old generated code. Apply the following patch (to remove static dispatch) to the base version of PyTorch: ``` diff --git a/aten/src/ATen/function_wrapper.py b/aten/src/ATen/function_wrapper.py index e26bb3941b..334475212b 100644 --- a/aten/src/ATen/function_wrapper.py +++ b/aten/src/ATen/function_wrapper.py @@ -147,7 +147,6 @@ TENSOR_METHOD_DEFINITION = CodeTemplate("""\ // ${schema_string} ${return_type} Tensor::${api_name}(${method_formals}) const { #ifdef USE_STATIC_DISPATCH - ${static_dispatch_method_body} #else static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::${operator_name}", "${overload_name}") @@ -173,7 +172,6 @@ FUNCTION_DEFINITION = CodeTemplate("""\ // ${schema_string} ${return_type} ${api_name}(${formals}) { #ifdef USE_STATIC_DISPATCH - ${static_dispatch_function_body} #else static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::${operator_name}", "${overload_name}") ``` and then we generate the old and new versions and diff them: ``` {build-old => build}/aten/src/ATen/BackendSelectRegister.cpp | 0 {build-old => build}/aten/src/ATen/CPUType.cpp | 0 {build-old => build}/aten/src/ATen/CUDAType.cpp | 0 {build-old => build}/aten/src/ATen/CUDAType.h | 0 build-old/aten/src/ATen/LegacyTHFunctionsCPU.cpp => /dev/null | 1712 ------------------- build-old/aten/src/ATen/LegacyTHFunctionsCPU.h => /dev/null | 67 - build-old/aten/src/ATen/LegacyTHFunctionsCUDA.cpp => /dev/null | 4176 --------------------------------------------- build-old/aten/src/ATen/LegacyTHFunctionsCUDA.h => /dev/null | 111 -- {build-old => build}/aten/src/ATen/MkldnnCPUType.cpp | 0 {build-old => build}/aten/src/ATen/NativeFunctions.h | 20 +- {build-old => build}/aten/src/ATen/QuantizedCPUType.cpp | 0 {build-old => build}/aten/src/ATen/QuantizedCUDAType.cpp | 0 {build-old => build}/aten/src/ATen/QuantizedCUDAType.h | 0 {build-old => build}/aten/src/ATen/SparseCPUType.cpp | 0 {build-old => build}/aten/src/ATen/SparseCUDAType.cpp | 0 {build-old => build}/aten/src/ATen/SparseCUDAType.h | 0 {build-old => build}/aten/src/ATen/TypeDefault.cpp | 0 {build-old => build}/aten/src/ATen/core/ATenOpList.cpp | 0 ``` The only diff is this: ``` diff --git a/build-old/aten/src/ATen/NativeFunctions.h b/build-new/aten/src/ATen/NativeFunctions.h index a0463dc80d..3808d27824 100644 --- a/build-old/aten/src/ATen/NativeFunctions.h +++ b/build-new/aten/src/ATen/NativeFunctions.h @@ -116,15 +116,15 @@ CAFFE2_API Tensor avg_pool1d(const Tensor & self, IntArrayRef kernel_size, IntAr CAFFE2_API Tensor adaptive_avg_pool1d(const Tensor & self, IntArrayRef output_size); CAFFE2_API std::tuple<Tensor,Tensor> adaptive_max_pool1d(const Tensor & self, IntArrayRef output_size); CAFFE2_API Tensor add(const Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor add_sparse(const Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_(Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_sparse_(Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor add_relu(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_relu_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_relu_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); @@ -639,15 +639,15 @@ CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indi CAFFE2_API std::tuple<Tensor,Tensor> mode(const Tensor & self, Dimname dim, bool keepdim=false); CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indices, const Tensor & self, Dimname dim, bool keepdim=false); CAFFE2_API Tensor mul(const Tensor & self, const Tensor & other); -CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other); CAFFE2_API Tensor mul_sparse(const Tensor & self, const Tensor & other); +CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_(Tensor & self, const Tensor & other); -CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_sparse_(Tensor & self, const Tensor & other); +CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out(Tensor & out, const Tensor & self, const Tensor & other); -CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other); +CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor mul(const Tensor & self, Scalar other); CAFFE2_API Tensor & mul_(Tensor & self, Scalar other); CAFFE2_API Tensor mv(const Tensor & self, const Tensor & vec); @@ -793,8 +793,8 @@ CAFFE2_API Tensor & silu_(Tensor & self); CAFFE2_API Tensor & silu_out(Tensor & out, const Tensor & self); CAFFE2_API Tensor silu_backward(const Tensor & grad_output, const Tensor & self); CAFFE2_API Tensor sigmoid(const Tensor & self); -CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self); CAFFE2_API Tensor sigmoid_quantized_cpu(const Tensor & self); +CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self); CAFFE2_API Tensor & sigmoid_(Tensor & self); CAFFE2_API Tensor & mkldnn_sigmoid_(Tensor & self); CAFFE2_API Tensor & sigmoid_out(Tensor & out, const Tensor & self); @@ -1008,17 +1008,17 @@ CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, bool kee CAFFE2_API Tensor nuclear_norm(const Tensor & self, IntArrayRef dim, bool keepdim=false); CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, IntArrayRef dim, bool keepdim=false); CAFFE2_API Tensor clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); +CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor mkldnn_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor quantized_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); -CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor & resize_as_(Tensor & self, const Tensor & the_template, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor & pow_out(Tensor & out, const Tensor & self, Scalar exponent); CAFFE2_API Tensor & pow_out_sparse_scalar(Tensor & out, const Tensor & self, Scalar exponent); CAFFE2_API Tensor pow(const Tensor & self, Scalar exponent); CAFFE2_API Tensor pow_sparse_scalar(const Tensor & self, Scalar exponent); CAFFE2_API Tensor & zero_(Tensor & self); -CAFFE2_API Tensor & mkldnn_zero_(Tensor & self); CAFFE2_API Tensor & zero_sparse_(Tensor & self); +CAFFE2_API Tensor & mkldnn_zero_(Tensor & self); CAFFE2_API Tensor & sub_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & sub_out_sparse(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor sub(const Tensor & self, const Tensor & other, Scalar alpha=1); @@ -1053,8 +1053,8 @@ CAFFE2_API Tensor & sparse_resize_(Tensor & self, IntArrayRef size, int64_t spar CAFFE2_API Tensor & sparse_resize_and_clear_(Tensor & self, IntArrayRef size, int64_t sparse_dim, int64_t dense_dim); CAFFE2_API Tensor sparse_mask_cpu(const Tensor & self, const Tensor & mask); CAFFE2_API Tensor sparse_mask_cuda(const Tensor & self, const Tensor & mask); -CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self); CAFFE2_API Tensor sparse_to_dense(const Tensor & self); +CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self); CAFFE2_API Tensor to_dense_backward(const Tensor & grad, const Tensor & input); CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self); CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self); ``` These are just wobbles in the order of the declarations; I couldn't be bothered to figure out exactly how the old codegen did the ordering. Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: [D23183978](https://our.internmc.facebook.com/intern/diff/D23183978) [ghstack-poisoned]

How to approach reviewing this diff: - The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen. - The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`. - All of the inputs to the old codegen are deleted. - Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI. - LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port. Signed-off-by: Edward Z. Yang <[email protected]> ghstack-source-id: d3efb3e8bee0f07cb26d6fe2f71144d7ef488af8 Pull Request resolved: #42629

codecov · 2020-08-30T22:51:07Z

Codecov Report

❗ No coverage uploaded for pull request base (gh/ezyang/819/base@3c1714c). Click here to learn what that means.
The diff coverage is n/a.

@@                  Coverage Diff                  @@
##             gh/ezyang/819/base   #42629   +/-   ##
=====================================================
  Coverage                      ?   69.32%           
=====================================================
  Files                         ?      378           
  Lines                         ?    46761           
  Branches                      ?        0           
=====================================================
  Hits                          ?    32417           
  Misses                        ?    14344           
  Partials                      ?        0

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3c1714c...01afc1b. Read the comment docs.

How to approach reviewing this diff: - The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen. - The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`. - All of the inputs to the old codegen are deleted. - Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI. - LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port. I kept the CUDA header in ATen/ to avoid having to fix a bunch of headers. This diff cannot be currently landed as it doesn't reimplement static dispatch. How do we know that this diff is right? We aimed for byte-for-byte modulo whitespace compatibility with the old generated code. Apply the following patch (to remove static dispatch) to the base version of PyTorch: ``` diff --git a/aten/src/ATen/function_wrapper.py b/aten/src/ATen/function_wrapper.py index e26bb3941b..334475212b 100644 --- a/aten/src/ATen/function_wrapper.py +++ b/aten/src/ATen/function_wrapper.py @@ -147,7 +147,6 @@ TENSOR_METHOD_DEFINITION = CodeTemplate("""\ // ${schema_string} ${return_type} Tensor::${api_name}(${method_formals}) const { #ifdef USE_STATIC_DISPATCH - ${static_dispatch_method_body} #else static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::${operator_name}", "${overload_name}") @@ -173,7 +172,6 @@ FUNCTION_DEFINITION = CodeTemplate("""\ // ${schema_string} ${return_type} ${api_name}(${formals}) { #ifdef USE_STATIC_DISPATCH - ${static_dispatch_function_body} #else static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::${operator_name}", "${overload_name}") ``` and then we generate the old and new versions and diff them: ``` {build-old => build}/aten/src/ATen/BackendSelectRegister.cpp | 0 {build-old => build}/aten/src/ATen/CPUType.cpp | 0 {build-old => build}/aten/src/ATen/CUDAType.cpp | 0 {build-old => build}/aten/src/ATen/CUDAType.h | 0 build-old/aten/src/ATen/LegacyTHFunctionsCPU.cpp => /dev/null | 1712 ------------------- build-old/aten/src/ATen/LegacyTHFunctionsCPU.h => /dev/null | 67 - build-old/aten/src/ATen/LegacyTHFunctionsCUDA.cpp => /dev/null | 4176 --------------------------------------------- build-old/aten/src/ATen/LegacyTHFunctionsCUDA.h => /dev/null | 111 -- {build-old => build}/aten/src/ATen/MkldnnCPUType.cpp | 0 {build-old => build}/aten/src/ATen/NativeFunctions.h | 20 +- {build-old => build}/aten/src/ATen/QuantizedCPUType.cpp | 0 {build-old => build}/aten/src/ATen/QuantizedCUDAType.cpp | 0 {build-old => build}/aten/src/ATen/QuantizedCUDAType.h | 0 {build-old => build}/aten/src/ATen/SparseCPUType.cpp | 0 {build-old => build}/aten/src/ATen/SparseCUDAType.cpp | 0 {build-old => build}/aten/src/ATen/SparseCUDAType.h | 0 {build-old => build}/aten/src/ATen/TypeDefault.cpp | 0 {build-old => build}/aten/src/ATen/core/ATenOpList.cpp | 0 ``` The only diff is this: ``` diff --git a/build-old/aten/src/ATen/NativeFunctions.h b/build-new/aten/src/ATen/NativeFunctions.h index a0463dc80d..3808d27824 100644 --- a/build-old/aten/src/ATen/NativeFunctions.h +++ b/build-new/aten/src/ATen/NativeFunctions.h @@ -116,15 +116,15 @@ CAFFE2_API Tensor avg_pool1d(const Tensor & self, IntArrayRef kernel_size, IntAr CAFFE2_API Tensor adaptive_avg_pool1d(const Tensor & self, IntArrayRef output_size); CAFFE2_API std::tuple<Tensor,Tensor> adaptive_max_pool1d(const Tensor & self, IntArrayRef output_size); CAFFE2_API Tensor add(const Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor add_sparse(const Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_(Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_sparse_(Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor add_relu(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_relu_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_relu_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); @@ -639,15 +639,15 @@ CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indi CAFFE2_API std::tuple<Tensor,Tensor> mode(const Tensor & self, Dimname dim, bool keepdim=false); CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indices, const Tensor & self, Dimname dim, bool keepdim=false); CAFFE2_API Tensor mul(const Tensor & self, const Tensor & other); -CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other); CAFFE2_API Tensor mul_sparse(const Tensor & self, const Tensor & other); +CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_(Tensor & self, const Tensor & other); -CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_sparse_(Tensor & self, const Tensor & other); +CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out(Tensor & out, const Tensor & self, const Tensor & other); -CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other); +CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor mul(const Tensor & self, Scalar other); CAFFE2_API Tensor & mul_(Tensor & self, Scalar other); CAFFE2_API Tensor mv(const Tensor & self, const Tensor & vec); @@ -793,8 +793,8 @@ CAFFE2_API Tensor & silu_(Tensor & self); CAFFE2_API Tensor & silu_out(Tensor & out, const Tensor & self); CAFFE2_API Tensor silu_backward(const Tensor & grad_output, const Tensor & self); CAFFE2_API Tensor sigmoid(const Tensor & self); -CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self); CAFFE2_API Tensor sigmoid_quantized_cpu(const Tensor & self); +CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self); CAFFE2_API Tensor & sigmoid_(Tensor & self); CAFFE2_API Tensor & mkldnn_sigmoid_(Tensor & self); CAFFE2_API Tensor & sigmoid_out(Tensor & out, const Tensor & self); @@ -1008,17 +1008,17 @@ CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, bool kee CAFFE2_API Tensor nuclear_norm(const Tensor & self, IntArrayRef dim, bool keepdim=false); CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, IntArrayRef dim, bool keepdim=false); CAFFE2_API Tensor clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); +CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor mkldnn_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor quantized_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); -CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor & resize_as_(Tensor & self, const Tensor & the_template, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor & pow_out(Tensor & out, const Tensor & self, Scalar exponent); CAFFE2_API Tensor & pow_out_sparse_scalar(Tensor & out, const Tensor & self, Scalar exponent); CAFFE2_API Tensor pow(const Tensor & self, Scalar exponent); CAFFE2_API Tensor pow_sparse_scalar(const Tensor & self, Scalar exponent); CAFFE2_API Tensor & zero_(Tensor & self); -CAFFE2_API Tensor & mkldnn_zero_(Tensor & self); CAFFE2_API Tensor & zero_sparse_(Tensor & self); +CAFFE2_API Tensor & mkldnn_zero_(Tensor & self); CAFFE2_API Tensor & sub_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & sub_out_sparse(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor sub(const Tensor & self, const Tensor & other, Scalar alpha=1); @@ -1053,8 +1053,8 @@ CAFFE2_API Tensor & sparse_resize_(Tensor & self, IntArrayRef size, int64_t spar CAFFE2_API Tensor & sparse_resize_and_clear_(Tensor & self, IntArrayRef size, int64_t sparse_dim, int64_t dense_dim); CAFFE2_API Tensor sparse_mask_cpu(const Tensor & self, const Tensor & mask); CAFFE2_API Tensor sparse_mask_cuda(const Tensor & self, const Tensor & mask); -CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self); CAFFE2_API Tensor sparse_to_dense(const Tensor & self); +CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self); CAFFE2_API Tensor to_dense_backward(const Tensor & grad, const Tensor & input); CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self); CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self); ``` These are just wobbles in the order of the declarations; I couldn't be bothered to figure out exactly how the old codegen did the ordering. Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: [D23183978](https://our.internmc.facebook.com/intern/diff/D23183978) [ghstack-poisoned]

How to approach reviewing this diff: - The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen. - The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`. - All of the inputs to the old codegen are deleted. - Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI. - LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port. Signed-off-by: Edward Z. Yang <[email protected]> ghstack-source-id: 1caf086ea28fcd2a556896fb162e35578f52c336 Pull Request resolved: #42629

facebook-github-bot · 2020-08-31T16:13:08Z

@ezyang merged this pull request in 6ea8916.

[WIP] Rewrite of ATen code generator

7aa2e4e

There is a single literate Python file. Start at the top and start reading from there. [ci skip] Signed-off-by: Edward Z. Yang <[email protected]> [ghstack-poisoned]

ezyang requested a review from zdevito August 5, 2020 20:35

smessmer reviewed Aug 6, 2020

View reviewed changes

bhosmer reviewed Aug 7, 2020

View reviewed changes

ezyang commented Aug 7, 2020

View reviewed changes

bhosmer reviewed Aug 8, 2020

View reviewed changes

[skip ci] on "[WIP] Rewrite of ATen code generator"

5160a95

Check coverage progress with: ``` python -m tools.codegen.gen && git diff --no-index --compact-summary build/aten/src/ATen{_new,}/ ``` [ci skip] Signed-off-by: Edward Z. Yang <[email protected]> [ghstack-poisoned]

[skip ci] on "[WIP] Rewrite of ATen code generator"

67645cc

Check coverage progress with: ``` python -m tools.codegen.gen && git diff --no-index --compact-summary build/aten/src/ATen{_new,}/ ``` [ci skip] Signed-off-by: Edward Z. Yang <[email protected]> [ghstack-poisoned]

[skip ci] on "[WIP] Rewrite of ATen code generator"

399944b

Check coverage progress with: ``` python -m tools.codegen.gen && git diff --no-index --compact-summary build/aten/src/ATen{_new,}/ ``` [ci skip] Signed-off-by: Edward Z. Yang <[email protected]> [ghstack-poisoned]

Update on "[WIP] Rewrite of ATen code generator"

e6b2e4e

Check coverage progress with: ``` python -m tools.codegen.gen && git diff --no-index --compact-summary build/aten/src/ATen{_new,}/ ``` [ci skip] Signed-off-by: Edward Z. Yang <[email protected]> [ghstack-poisoned]

[skip ci] on "[WIP] Rewrite of ATen code generator"

078a5fe

Check coverage progress with: ``` python -m tools.codegen.gen && git diff --no-index --compact-summary build/aten/src/ATen{_new,}/ ``` [ci skip] Signed-off-by: Edward Z. Yang <[email protected]> [ghstack-poisoned]

bhosmer approved these changes Aug 25, 2020

View reviewed changes

ezyang mentioned this pull request Aug 27, 2020

[TESTING ONLY] Test with static dispatch removed #43722

Closed

ezyang added 2 commits August 27, 2020 14:22

ezyang mentioned this pull request Aug 29, 2020

Add _foreach_add_(TensorList tensors, Scalar scalar) API #42531

Closed

facebook-github-bot closed this in 6ea8916 Aug 31, 2020

facebook-github-bot added the merged label Aug 31, 2020

facebook-github-bot deleted the gh/ezyang/819/head branch September 4, 2020 14:17

seemethere mentioned this pull request Oct 13, 2020

Define objects using classes instead of namedtuples in torch.utils.data._utils.worker #45870

Closed

mruberry added the Merged label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite of ATen code generator #42629

Rewrite of ATen code generator #42629

ezyang commented Aug 5, 2020 •

edited

Loading

dr-ci bot commented Aug 5, 2020 •

edited

Loading

smessmer left a comment

smessmer Aug 5, 2020

smessmer Aug 5, 2020

smessmer Aug 5, 2020

smessmer Aug 5, 2020

smessmer Aug 5, 2020

smessmer Aug 5, 2020

ezyang Aug 7, 2020

smessmer Aug 6, 2020

smessmer Aug 6, 2020

smessmer Aug 6, 2020

smessmer Aug 6, 2020

ezyang Aug 7, 2020

bhosmer Aug 7, 2020

bhosmer Aug 7, 2020 •

edited

Loading

ezyang Aug 7, 2020

ezyang Aug 12, 2020

ezyang Aug 7, 2020

bhosmer Aug 8, 2020

bhosmer Aug 8, 2020

bhosmer Aug 8, 2020 •

edited

Loading

bhosmer Aug 8, 2020

ezyang Aug 10, 2020

bhosmer Aug 8, 2020

ezyang Aug 10, 2020

bhosmer Aug 10, 2020

ezyang commented Aug 25, 2020

bhosmer left a comment

bhosmer Aug 25, 2020

codecov bot commented Aug 30, 2020 •

edited

Loading

facebook-github-bot commented Aug 31, 2020

Rewrite of ATen code generator #42629

Rewrite of ATen code generator #42629

Conversation

ezyang commented Aug 5, 2020 • edited Loading

dr-ci bot commented Aug 5, 2020 • edited Loading

💊 CI failures summary and remediations

smessmer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bhosmer Aug 7, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bhosmer Aug 8, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ezyang commented Aug 25, 2020

bhosmer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Aug 30, 2020 • edited Loading

Codecov Report

facebook-github-bot commented Aug 31, 2020

ezyang commented Aug 5, 2020 •

edited

Loading

dr-ci bot commented Aug 5, 2020 •

edited

Loading

bhosmer Aug 7, 2020 •

edited

Loading

bhosmer Aug 8, 2020 •

edited

Loading

codecov bot commented Aug 30, 2020 •

edited

Loading