Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite of ATen code generator #42629

Closed
wants to merge 33 commits into from
Closed

Conversation

ezyang
Copy link
Contributor

@ezyang ezyang commented Aug 5, 2020

Stack from ghstack:

How to approach reviewing this diff:

  • The new codegen itself lives in tools/codegen. Start with gen.py, then read model.py and them the api/ folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just python -m tools.codegen.gen, something reasonable will happen.
  • The old codegen is (nearly) entirely deleted; every Python file in aten/src/ATen was deleted except for common_with_cwrap.py, which now permanently finds its home in tools/shared/cwrap_common.py (previously cmake copied the file there), and code_template.py, which now lives in tools/codegen/code_template.py. We remove the copying logic for common_with_cwrap.py.
  • All of the inputs to the old codegen are deleted.
  • Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI.
  • LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port. I kept the CUDA header in ATen/ to avoid having to fix a bunch of headers.

This diff cannot be currently landed as it doesn't reimplement static dispatch.

How do we know that this diff is right? We aimed for byte-for-byte modulo whitespace compatibility with the old generated code. Apply the following patch (to remove static dispatch) to the base version of PyTorch:

diff --git a/aten/src/ATen/function_wrapper.py b/aten/src/ATen/function_wrapper.py
index e26bb3941b..334475212b 100644
--- a/aten/src/ATen/function_wrapper.py
+++ b/aten/src/ATen/function_wrapper.py
@@ -147,7 +147,6 @@ TENSOR_METHOD_DEFINITION = CodeTemplate("""\
 // ${schema_string}
 ${return_type} Tensor::${api_name}(${method_formals}) const {
 #ifdef USE_STATIC_DISPATCH
-    ${static_dispatch_method_body}
 #else
     static auto op = c10::Dispatcher::singleton()
         .findSchemaOrThrow("aten::${operator_name}", "${overload_name}")
@@ -173,7 +172,6 @@ FUNCTION_DEFINITION = CodeTemplate("""\
 // ${schema_string}
 ${return_type} ${api_name}(${formals}) {
 #ifdef USE_STATIC_DISPATCH
-    ${static_dispatch_function_body}
 #else
     static auto op = c10::Dispatcher::singleton()
         .findSchemaOrThrow("aten::${operator_name}", "${overload_name}")

and then we generate the old and new versions and diff them:

 {build-old => build}/aten/src/ATen/BackendSelectRegister.cpp                                 |    0
 {build-old => build}/aten/src/ATen/CPUType.cpp                                               |    0
 {build-old => build}/aten/src/ATen/CUDAType.cpp                                              |    0
 {build-old => build}/aten/src/ATen/CUDAType.h                                                |    0
 build-old/aten/src/ATen/LegacyTHFunctionsCPU.cpp => /dev/null                                | 1712 -------------------
 build-old/aten/src/ATen/LegacyTHFunctionsCPU.h => /dev/null                                  |   67 -
 build-old/aten/src/ATen/LegacyTHFunctionsCUDA.cpp => /dev/null                               | 4176 ---------------------------------------------
 build-old/aten/src/ATen/LegacyTHFunctionsCUDA.h => /dev/null                                 |  111 --
 {build-old => build}/aten/src/ATen/MkldnnCPUType.cpp                                         |    0
 {build-old => build}/aten/src/ATen/NativeFunctions.h                                         |   20 +-
 {build-old => build}/aten/src/ATen/QuantizedCPUType.cpp                                      |    0
 {build-old => build}/aten/src/ATen/QuantizedCUDAType.cpp                                     |    0
 {build-old => build}/aten/src/ATen/QuantizedCUDAType.h                                       |    0
 {build-old => build}/aten/src/ATen/SparseCPUType.cpp                                         |    0
 {build-old => build}/aten/src/ATen/SparseCUDAType.cpp                                        |    0
 {build-old => build}/aten/src/ATen/SparseCUDAType.h                                          |    0
 {build-old => build}/aten/src/ATen/TypeDefault.cpp                                           |    0
 {build-old => build}/aten/src/ATen/core/ATenOpList.cpp                                       |    0

The only diff is this:

 diff --git a/build-old/aten/src/ATen/NativeFunctions.h b/build-new/aten/src/ATen/NativeFunctions.h
index a0463dc80d..3808d27824 100644
--- a/build-old/aten/src/ATen/NativeFunctions.h
+++ b/build-new/aten/src/ATen/NativeFunctions.h
@@ -116,15 +116,15 @@ CAFFE2_API Tensor avg_pool1d(const Tensor & self, IntArrayRef kernel_size, IntAr
 CAFFE2_API Tensor adaptive_avg_pool1d(const Tensor & self, IntArrayRef output_size);
 CAFFE2_API std::tuple<Tensor,Tensor> adaptive_max_pool1d(const Tensor & self, IntArrayRef output_size);
 CAFFE2_API Tensor add(const Tensor & self, const Tensor & other, Scalar alpha=1);
-CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor add_sparse(const Tensor & self, const Tensor & other, Scalar alpha=1);
+CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_(Tensor & self, const Tensor & other, Scalar alpha=1);
-CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_sparse_(Tensor & self, const Tensor & other, Scalar alpha=1);
+CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
-CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
+CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor add_relu(const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_relu_(Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_relu_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
@@ -639,15 +639,15 @@ CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indi
 CAFFE2_API std::tuple<Tensor,Tensor> mode(const Tensor & self, Dimname dim, bool keepdim=false);
 CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indices, const Tensor & self, Dimname dim, bool keepdim=false);
 CAFFE2_API Tensor mul(const Tensor & self, const Tensor & other);
-CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor mul_sparse(const Tensor & self, const Tensor & other);
+CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_(Tensor & self, const Tensor & other);
-CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_sparse_(Tensor & self, const Tensor & other);
+CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_out(Tensor & out, const Tensor & self, const Tensor & other);
-CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other);
+CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor mul(const Tensor & self, Scalar other);
 CAFFE2_API Tensor & mul_(Tensor & self, Scalar other);
 CAFFE2_API Tensor mv(const Tensor & self, const Tensor & vec);
@@ -793,8 +793,8 @@ CAFFE2_API Tensor & silu_(Tensor & self);
 CAFFE2_API Tensor & silu_out(Tensor & out, const Tensor & self);
 CAFFE2_API Tensor silu_backward(const Tensor & grad_output, const Tensor & self);
 CAFFE2_API Tensor sigmoid(const Tensor & self);
-CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self);
 CAFFE2_API Tensor sigmoid_quantized_cpu(const Tensor & self);
+CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self);
 CAFFE2_API Tensor & sigmoid_(Tensor & self);
 CAFFE2_API Tensor & mkldnn_sigmoid_(Tensor & self);
 CAFFE2_API Tensor & sigmoid_out(Tensor & out, const Tensor & self);
@@ -1008,17 +1008,17 @@ CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, bool kee
 CAFFE2_API Tensor nuclear_norm(const Tensor & self, IntArrayRef dim, bool keepdim=false);
 CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, IntArrayRef dim, bool keepdim=false);
 CAFFE2_API Tensor clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
+CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
 CAFFE2_API Tensor mkldnn_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
 CAFFE2_API Tensor quantized_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
-CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
 CAFFE2_API Tensor & resize_as_(Tensor & self, const Tensor & the_template, c10::optional<MemoryFormat> memory_format=c10::nullopt);
 CAFFE2_API Tensor & pow_out(Tensor & out, const Tensor & self, Scalar exponent);
 CAFFE2_API Tensor & pow_out_sparse_scalar(Tensor & out, const Tensor & self, Scalar exponent);
 CAFFE2_API Tensor pow(const Tensor & self, Scalar exponent);
 CAFFE2_API Tensor pow_sparse_scalar(const Tensor & self, Scalar exponent);
 CAFFE2_API Tensor & zero_(Tensor & self);
-CAFFE2_API Tensor & mkldnn_zero_(Tensor & self);
 CAFFE2_API Tensor & zero_sparse_(Tensor & self);
+CAFFE2_API Tensor & mkldnn_zero_(Tensor & self);
 CAFFE2_API Tensor & sub_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & sub_out_sparse(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor sub(const Tensor & self, const Tensor & other, Scalar alpha=1);
@@ -1053,8 +1053,8 @@ CAFFE2_API Tensor & sparse_resize_(Tensor & self, IntArrayRef size, int64_t spar
 CAFFE2_API Tensor & sparse_resize_and_clear_(Tensor & self, IntArrayRef size, int64_t sparse_dim, int64_t dense_dim);
 CAFFE2_API Tensor sparse_mask_cpu(const Tensor & self, const Tensor & mask);
 CAFFE2_API Tensor sparse_mask_cuda(const Tensor & self, const Tensor & mask);
-CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self);
 CAFFE2_API Tensor sparse_to_dense(const Tensor & self);
+CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self);
 CAFFE2_API Tensor to_dense_backward(const Tensor & grad, const Tensor & input);
 CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self);
 CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self);

These are just wobbles in the order of the declarations; I couldn't be bothered to figure out exactly how the old codegen did the ordering.

Signed-off-by: Edward Z. Yang [email protected]

Differential Revision: D23183978

There is a single literate Python file.  Start at the top
and start reading from there.

[ci skip]

Signed-off-by: Edward Z. Yang <[email protected]>

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Aug 5, 2020
There is a single literate Python file.  Start at the top
and start reading from there.

[ci skip]

Signed-off-by: Edward Z. Yang <[email protected]>

ghstack-source-id: 504b1581e235302c7d9ac172dd34adee4bc0c29e
Pull Request resolved: #42629
@ezyang ezyang requested a review from zdevito August 5, 2020 20:35
There is a single literate Python file.  Start at the top
and start reading from there.

This file currently produces a 100% compatible CPUType.h
definition.  You can tell by running:

```
python aten/src/ATen/new_gen.py | git diff --word-diff --no-index - build/aten/src/ATen/CPUType.h
```

[ci skip]

Signed-off-by: Edward Z. Yang <[email protected]>

[ghstack-poisoned]
There is a single literate Python file.  Start at the top
and start reading from there.

This file currently produces a 100% compatible CPUType.h
definition.  You can tell by running:

```
python aten/src/ATen/new_gen.py | git diff --word-diff --no-index - build/aten/src/ATen/CPUType.h
```

[ci skip]

Signed-off-by: Edward Z. Yang <[email protected]>

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Aug 5, 2020
There is a single literate Python file.  Start at the top
and start reading from there.

This file currently produces a 100% compatible CPUType.h
definition.  You can tell by running:

```
python aten/src/ATen/new_gen.py | git diff --word-diff --no-index - build/aten/src/ATen/CPUType.h
```

[ci skip]

Signed-off-by: Edward Z. Yang <[email protected]>

ghstack-source-id: 33ef10f23cacbfa299304e18c8ff132910950a6c
Pull Request resolved: #42629
@dr-ci
Copy link

dr-ci bot commented Aug 5, 2020

💊 CI failures summary and remediations

As of commit 01afc1b (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 144 times.

Copy link
Contributor

@smessmer smessmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is much more structured and easier to understand than the old codegen. Thanks a lot for doing this.

# You can see some of the overall design patterns for how we setup
# dataclasses in this class, but we will defer a complete discussion
# of this at FunctionSchema.
@dataclass(frozen=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't give enough heart emojies for making this immutable

# Corresponds to the 'use_c10_dispatcher' field. Historically,
# this field could take several possible strings, but right
# now you can have it in any color you like, as long as it's 'full'
use_c10_dispatcher_full: bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Maybe make it an enum with two values instead? Mypy should make that type-safe and it would be closer to the native_functions.yaml representation.


# Distinguish between a missing dispatch dict (historically, this
# means to register a catch-all kernel) and a present but empty
# dispatch dict (this means register nothing; arguably, this should
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I didn't know empty dispatch dict was a thing. Yes, this should absolutely subsume manual_kernel_registrations after @ailzhang 's change landed and manual_kernel_registrations actually does what you describe above.

func = FunctionSchema.parse(funcs)

use_c10_dispatcher = e.get('use_c10_dispatcher')
assert use_c10_dispatcher is None or use_c10_dispatcher == 'full', \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically, use_c10_dispatcher had two possible values ['full', 'with_codegenerated_unboxing_wrapper'] and the second one was the default. I think there might be a phase where we can change the default to 'full' but still need to opt-out a few using 'with_codegenerated_unboxing_wrapper' by specifying that key manually in native_functions.yaml, so I'd keep that representation and not just make it a boolean.

variants_s = e.get('variants', 'function')
assert isinstance(variants_s, str)
variants: Set[Variant] = set()
for v in variants_s.split(', '):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Write a Variant.parse to factor this out?

is_write: bool

@staticmethod
def parse(ann: str) -> 'Annotation':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I remember reading something about mypy not requiring string-wrapping for forward declared types anymore...but I might be mistaken.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mypy doesn't require it, but Python does D:

return r

@property
def is_write(self) -> bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_mutable ?

def parse_arguments(args: str) -> Tuple[Sequence[Argument], Sequence[Argument], Sequence[Argument]]:
"""
Input: 'int x, int y, int z'
Output: positional args, kwarg only args
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're missing docs for the third output

out_arguments: List[Argument] = []
arguments_acc = arguments

# TODO: Use a real parser here; this will get bamboozled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a lot of the code in here could benefit from a real parser. There seem to be a few libraries for python parser combinators, they would allow a concise syntax and might be useful.

# TODO: TensorOptions argument detection
# TODO: Extra enforcement of inplace functions having mutable self

# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is a prototype, but for the final version, I would propose splitting generation into a separate file from the data structure, probably a separate file for each generated artifact.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, having multiple files will be good.

# dispatch dict (this means register nothing; arguably, this should
# subsume manual_kernel_registration).
#
# TODO: str key could be replaced with more explicit enum
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting one - it would be great to drive both codegen and c++ defs off the same source of truth, especially if we want to use the more complicated relationships we're contemplating, like aliases. (And unless we do unify the source of truth, a more semantic datatype will add change-tracking overhead in any case.)

# TODO: str key could be replaced with more explicit enum
dispatch: Optional[Dict[str, str]]

# The location in the YAML file were this native function entry was
Copy link

@bhosmer bhosmer Aug 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: were

I was parsing subjunctive until I hit "defined" 😬

There is a single literate Python file.  Start at the top
and start reading from there.

This file currently produces a 100% compatible CPUType.h
definition.  You can tell by running:

```
python aten/src/ATen/new_gen.py | git diff --word-diff --no-index - build/aten/src/ATen/CPUType.h
```

[ci skip]

Signed-off-by: Edward Z. Yang <[email protected]>

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Aug 7, 2020
There is a single literate Python file.  Start at the top
and start reading from there.

This file currently produces a 100% compatible CPUType.h
definition.  You can tell by running:

```
python aten/src/ATen/new_gen.py | git diff --word-diff --no-index - build/aten/src/ATen/CPUType.h
```

[ci skip]

Signed-off-by: Edward Z. Yang <[email protected]>

ghstack-source-id: 43b460f7e16482c4143fdf16b0324874fe816dc6
Pull Request resolved: #42629
# I'm not really sure how to structure this logic yet, but here is a
# sketch. This function is ONLY correct for CPUType.h at the moment;
# I bet I am going to need another parameter before I'm done
def cpp_type(t: Type, *, mutable: bool, argument: bool, legacy_optional: bool) -> str:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bhosmer suggests making this an lvalue versus rvalue distinction (instead of argument)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bhosmer On further reflection, I'm not sure lvalue v rvalue distinction really makes sense here. lvalue and rvalue refers to expressions, but the computation here involves types!

assert False, f"unsupported type: {t}"
elif isinstance(t, OptionalType):
# TODO: these arguments are smoothed over by the hacky wrapper
if argument and legacy_optional and str(t.elem) == 'Tensor':
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bhosmer suggests factoring out the legacy_optional logic, perhaps into a wrapper

# should have a signature equivalent to its pure variant,
# but just with extra kwargs for the output elements. This
# is difficult to actually check for and historically
# we only do this check in tools/
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the kind of entropy reduction we'd harvest by merging (or sharing utilities across) ATen and tools/ codegen: there's a bunch of logic over there that does semantic checking over the same interpretations of these definitions. Ideally we'd eventually merge that into what you're doing here and use that canonical model everywhere.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @ljk53 per our recent convos around codegen


# A custom loader for YAML to let us also keep track of line numbers
# of each entry in the YAML file
class LineLoader(Loader):
Copy link

@bhosmer bhosmer Aug 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is gold. If the final version of this could save location info into Declarations.yaml it could be used in the tools/ codegen for the same usability bump (although at least some of that code is pretty careful to hand-introduce some context - op names mostly IIRC - into its errors, so it's not awful. But this would be better).

cpp_args.extend(map(format_arg, f.func.out_arguments))
cpp_args.extend(map(format_arg, f.func.arguments))

# Discover TensorOptions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be either more permissive or more proscriptive here? I.e. it might be more user friendly to either accept topt args in any order, or error if they appear out of order, instead of just letting out-of-order args escape the dragnet and cause unexpected behavior downstream.

Aside: I think this is actually a good example of the kind of situation where a codegen pipeline needs a similar attitude to input handling as a compiler frontend - dumb/strict is fine, smart/lenient is fine, but dumb/lenient (i.e. fishing around for patterns and letting misses go by) is an easy way to introduce bad user experience, and IMO a major source of codegen's bad usability rep in general.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old codegen is much more proscriptive, requiring specific defaults and optionality before it matches. It is really wordy as a result. I could see this code being a bit too relaxed. I think fundamentally whether or not we grab arguments is a combination of type and name. However, the type is a relaxed test: I don't care if the type is optional or not in schema. If it is defaulted I need to generate defaulting code accordingly for tensor options.

Defo gonna change this.

i += 1

rs.append(f"CAFFE2_API {cpp_return} {name}({', '.join(cpp_args)});")
return rs
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I know this is way more than you were looking for but per our chat this afternoon here's a version of compute_function_declarations() that uses some moves that kept me sane when I rewrote gen_python_functions way back when.

  • ruthlessly pulling functions up to the top level and plumbing all their dependencies into them as explicit params. The original compute_function_declarations() is still pretty small and clear, but reading the dependencies between local functions and closed-over locals is already getting nontrivial... whereas in the flattened version it's super obvious what e.g. formatting an arg depends on, what formatting a tensor options arg depends on (just the function name!) etc.

  • analogously, yanking any data constants up into global space and throw all available info into their names and comments. Helps visibility and avoiding drift-prone repetition of special cases buried inside functions... and for the many, many of these that are pattern data for workarounds, it demarcates them more clearly as the eruptions of irregularity that they are 😁

  • being super finicky about missed patterns (here yelling about out-of-sequence tensor options args)

I went ahead and did the whole exercise here just to show how the end result helps (me, anyway) distinguish between the general case and the special cases, as well as what input is being fed into each decision. Well, plus I did it bc I got carried away :P

One last pitch - it might seem like overkill to be so hardcore, now while the code is pretty tight and tractable - but by the time I got to gen_python_functions some of the analogs to compute_function_declarations() were like multi-hundred line behemoths with nested and doubly-nested inner functions that just closed over everything, big dictionaries getting tweaked miles and miles away, the whole deal. I think the flatter and more explicitly plumbed we make things now, the easier it'll be to fight that kind of entropy.

# format an argument for c++
def format_arg(a: Argument, legacy_optional: bool) -> str:
    # DEFAULTING IS NEW
    default = f"={cpp_default(a.default, a.type)}" if a.default is not None else ""
    # TODO: Always NO legacy optional
    return f"{cpp_type(a.type, mutable=a.is_write, argument=True, legacy_optional=legacy_optional)} {a.name}{default}"

# tensor options attributes come in scattered, we recognize and convert to a single TensorOptions arg
TOPT_NAMES = ['dtype', 'layout', 'device', 'pin_memory']
TOPT_LEN = len(TOPT_NAMES)

# true if i begins the scattered tensor options arguments in args.
# note: it's an error for these to be out of order. (TODO what about noncontiguous?)
def at_topt_args(i: int, args: List[Argument]) -> bool:
    in_seq = i <= len(args) - TOPT_LEN and all(args[i+j].name == TOPT_NAMES[j] for j in range(TOPT_LEN))
    perm = i <= len(args) - TOPT_LEN and set([a.name for a in args[i:i + TOPT_LEN]]) == set(TOPT_NAMES)
    if in_seq != perm:
        raise ValueError(f"TensorOptions arguments must be specified in the following order: {', '.join(TOPT_NAMES)}")
    return in_seq

# TODO these need TensorOptions without a default for some reason
TENSOR_OPTIONS_NO_DEFAULT = [
    "_cudnn_init_dropout_state",
    "sparse_coo_tensor.size",
    "_sparse_coo_tensor_with_dims",
    "_sparse_coo_tensor_with_dims_and_tensors"]

# TODO these need TensorOptions defaulted to (dtype) long
TENSOR_OPTIONS_DEFAULT_LONG = ["tril_indices", "triu_indices"]

# format TensorOptions arg. Handle some special cases by name
def format_topt_arg(f: NativeFunction):
    if str(f.func.name) in TENSOR_OPTIONS_NO_DEFAULT:
        # I think this is a bug in the original
        return 'const TensorOptions & options'
    elif str(f.func.name) in TENSOR_OPTIONS_DEFAULT_LONG:
        return 'const TensorOptions & options=at::kLong'
    else:
        return 'const TensorOptions & options={}'  # MODIFIED

# compute all c++ function declarations 
def compute_function_declarations() -> List[str]:
    rs: List[str] = []
    for f in native_functions:
        with context(f'in {f.loc}:\n  {f.func}'):
            if f.manual_kernel_registration:
                continue
            if Variant.function not in f.variants:
                continue

            # TODO: clear up naming
            cpp_return = cpp_type_return(f.func.returns)
            name = str(f.func.name.name)
            if f.func.is_out_fn():
                name += '_out'

            cpp_args: List[str] = []
            cpp_args.extend(map(lambda a: format_arg(a, not f.use_c10_dispatcher_full), f.func.out_arguments))
            cpp_args.extend(map(lambda a: format_arg(a, not f.use_c10_dispatcher_full), f.func.arguments))

            # Discover TensorOptions
            kwargs = list(f.func.kwarg_only_arguments)  # short name
            i = 0
            while i < len(kwargs):
                if at_topt_args(i, kwargs):
                    cpp_args.append(format_topt_arg(f))
                    i += len(TOPT_NAMES)
                else:
                    cpp_args.append(format_arg(kwargs[i], not f.use_c10_dispatcher_full))
                    i += 1

            rs.append(f"CAFFE2_API {cpp_return} {name}({', '.join(cpp_args)});")
    return rs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, pulling things up seems appropriate. One challenge is I need some sort of top level organizational principle (you toss things as local definitions so you don't have to worry about this.)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For ordering I generally try to stick to in-order, with comment headers to replicate the visual effect of nesting where it helps. So e.g. the code block above might have a big

# --------- compute_function_declarations and helpers -----------
#
# ...
#

at the top, and maybe a smaller one above the stuff about TensorOptions args.

You're right that it forgoes the natural structuring effect of nested functions, which is a bummer. But the readability payoff of explicit dependencies is definitely worth I think

There is a single literate Python file.  Start at the top
and start reading from there.

This file currently produces a 100% compatible CPUType.h
definition.  You can tell by running:

```
python aten/src/ATen/new_gen.py | git diff --word-diff --no-index - build/aten/src/ATen/CPUType.h
```

[ci skip]

Signed-off-by: Edward Z. Yang <[email protected]>

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Aug 12, 2020
There is a single literate Python file.  Start at the top
and start reading from there.

This file currently produces a 100% compatible CPUType.h
definition.  You can tell by running:

```
python aten/src/ATen/new_gen.py | git diff --word-diff --no-index - build/aten/src/ATen/CPUType.h
```

[ci skip]

Signed-off-by: Edward Z. Yang <[email protected]>

ghstack-source-id: 9142ae6993096d49493be2f296c8dcd95d87b8b4
Pull Request resolved: #42629
Check coverage progress with:

```
python -m tools.codegen.gen && git diff --no-index --compact-summary build/aten/src/ATen{_new,}/
```

[ci skip]

Signed-off-by: Edward Z. Yang <[email protected]>

[ghstack-poisoned]
Check coverage progress with:

```
python -m tools.codegen.gen && git diff --no-index --compact-summary build/aten/src/ATen{_new,}/
```

[ci skip]

Signed-off-by: Edward Z. Yang <[email protected]>

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Aug 13, 2020
Check coverage progress with:

```
python -m tools.codegen.gen && git diff --no-index --compact-summary build/aten/src/ATen{_new,}/
```

[ci skip]

Signed-off-by: Edward Z. Yang <[email protected]>

ghstack-source-id: 934530d599066d93f6844aded8e13a13e997ea69
Pull Request resolved: #42629
Check coverage progress with:

```
python -m tools.codegen.gen && git diff --no-index --compact-summary build/aten/src/ATen{_new,}/
```

[ci skip]

Signed-off-by: Edward Z. Yang <[email protected]>

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Aug 14, 2020
Check coverage progress with:

```
python -m tools.codegen.gen && git diff --no-index --compact-summary build/aten/src/ATen{_new,}/
```

[ci skip]

Signed-off-by: Edward Z. Yang <[email protected]>

ghstack-source-id: 6e60e097ee2e94c57ccc626555f30726ea583fab
Pull Request resolved: #42629
Check coverage progress with:

```
python -m tools.codegen.gen && git diff --no-index --compact-summary build/aten/src/ATen{_new,}/
```

[ci skip]

Signed-off-by: Edward Z. Yang <[email protected]>

[ghstack-poisoned]
Check coverage progress with:

```
python -m tools.codegen.gen && git diff --no-index --compact-summary build/aten/src/ATen{_new,}/
```

[ci skip]

Signed-off-by: Edward Z. Yang <[email protected]>

[ghstack-poisoned]
How to approach reviewing this diff:

- The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen.
- The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`.
- All of the inputs to the old codegen are deleted.
- Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI.
- LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port. I kept the CUDA header in ATen/ to avoid having to fix a bunch of headers.

This diff cannot be currently landed as it doesn't reimplement static dispatch.

How do we know that this diff is right? We aimed for byte-for-byte modulo whitespace compatibility with the old generated code. Apply the following patch (to remove static dispatch) to the base version of PyTorch:

```
diff --git a/aten/src/ATen/function_wrapper.py b/aten/src/ATen/function_wrapper.py
index e26bb3941b..334475212b 100644
--- a/aten/src/ATen/function_wrapper.py
+++ b/aten/src/ATen/function_wrapper.py
@@ -147,7 +147,6 @@ TENSOR_METHOD_DEFINITION = CodeTemplate("""\
 // ${schema_string}
 ${return_type} Tensor::${api_name}(${method_formals}) const {
 #ifdef USE_STATIC_DISPATCH
-    ${static_dispatch_method_body}
 #else
     static auto op = c10::Dispatcher::singleton()
         .findSchemaOrThrow("aten::${operator_name}", "${overload_name}")
@@ -173,7 +172,6 @@ FUNCTION_DEFINITION = CodeTemplate("""\
 // ${schema_string}
 ${return_type} ${api_name}(${formals}) {
 #ifdef USE_STATIC_DISPATCH
-    ${static_dispatch_function_body}
 #else
     static auto op = c10::Dispatcher::singleton()
         .findSchemaOrThrow("aten::${operator_name}", "${overload_name}")
```

and then we generate the old and new versions and diff them:

```
 {build-old => build}/aten/src/ATen/BackendSelectRegister.cpp                                 |    0
 {build-old => build}/aten/src/ATen/CPUType.cpp                                               |    0
 {build-old => build}/aten/src/ATen/CUDAType.cpp                                              |    0
 {build-old => build}/aten/src/ATen/CUDAType.h                                                |    0
 build-old/aten/src/ATen/LegacyTHFunctionsCPU.cpp => /dev/null                                | 1712 -------------------
 build-old/aten/src/ATen/LegacyTHFunctionsCPU.h => /dev/null                                  |   67 -
 build-old/aten/src/ATen/LegacyTHFunctionsCUDA.cpp => /dev/null                               | 4176 ---------------------------------------------
 build-old/aten/src/ATen/LegacyTHFunctionsCUDA.h => /dev/null                                 |  111 --
 {build-old => build}/aten/src/ATen/MkldnnCPUType.cpp                                         |    0
 {build-old => build}/aten/src/ATen/NativeFunctions.h                                         |   20 +-
 {build-old => build}/aten/src/ATen/QuantizedCPUType.cpp                                      |    0
 {build-old => build}/aten/src/ATen/QuantizedCUDAType.cpp                                     |    0
 {build-old => build}/aten/src/ATen/QuantizedCUDAType.h                                       |    0
 {build-old => build}/aten/src/ATen/SparseCPUType.cpp                                         |    0
 {build-old => build}/aten/src/ATen/SparseCUDAType.cpp                                        |    0
 {build-old => build}/aten/src/ATen/SparseCUDAType.h                                          |    0
 {build-old => build}/aten/src/ATen/TypeDefault.cpp                                           |    0
 {build-old => build}/aten/src/ATen/core/ATenOpList.cpp                                       |    0
```

The only diff is this:

```
 diff --git a/build-old/aten/src/ATen/NativeFunctions.h b/build-new/aten/src/ATen/NativeFunctions.h
index a0463dc80d..3808d27824 100644
--- a/build-old/aten/src/ATen/NativeFunctions.h
+++ b/build-new/aten/src/ATen/NativeFunctions.h
@@ -116,15 +116,15 @@ CAFFE2_API Tensor avg_pool1d(const Tensor & self, IntArrayRef kernel_size, IntAr
 CAFFE2_API Tensor adaptive_avg_pool1d(const Tensor & self, IntArrayRef output_size);
 CAFFE2_API std::tuple<Tensor,Tensor> adaptive_max_pool1d(const Tensor & self, IntArrayRef output_size);
 CAFFE2_API Tensor add(const Tensor & self, const Tensor & other, Scalar alpha=1);
-CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor add_sparse(const Tensor & self, const Tensor & other, Scalar alpha=1);
+CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_(Tensor & self, const Tensor & other, Scalar alpha=1);
-CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_sparse_(Tensor & self, const Tensor & other, Scalar alpha=1);
+CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
-CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
+CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor add_relu(const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_relu_(Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_relu_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
@@ -639,15 +639,15 @@ CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indi
 CAFFE2_API std::tuple<Tensor,Tensor> mode(const Tensor & self, Dimname dim, bool keepdim=false);
 CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indices, const Tensor & self, Dimname dim, bool keepdim=false);
 CAFFE2_API Tensor mul(const Tensor & self, const Tensor & other);
-CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor mul_sparse(const Tensor & self, const Tensor & other);
+CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_(Tensor & self, const Tensor & other);
-CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_sparse_(Tensor & self, const Tensor & other);
+CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_out(Tensor & out, const Tensor & self, const Tensor & other);
-CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other);
+CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor mul(const Tensor & self, Scalar other);
 CAFFE2_API Tensor & mul_(Tensor & self, Scalar other);
 CAFFE2_API Tensor mv(const Tensor & self, const Tensor & vec);
@@ -793,8 +793,8 @@ CAFFE2_API Tensor & silu_(Tensor & self);
 CAFFE2_API Tensor & silu_out(Tensor & out, const Tensor & self);
 CAFFE2_API Tensor silu_backward(const Tensor & grad_output, const Tensor & self);
 CAFFE2_API Tensor sigmoid(const Tensor & self);
-CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self);
 CAFFE2_API Tensor sigmoid_quantized_cpu(const Tensor & self);
+CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self);
 CAFFE2_API Tensor & sigmoid_(Tensor & self);
 CAFFE2_API Tensor & mkldnn_sigmoid_(Tensor & self);
 CAFFE2_API Tensor & sigmoid_out(Tensor & out, const Tensor & self);
@@ -1008,17 +1008,17 @@ CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, bool kee
 CAFFE2_API Tensor nuclear_norm(const Tensor & self, IntArrayRef dim, bool keepdim=false);
 CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, IntArrayRef dim, bool keepdim=false);
 CAFFE2_API Tensor clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
+CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
 CAFFE2_API Tensor mkldnn_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
 CAFFE2_API Tensor quantized_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
-CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
 CAFFE2_API Tensor & resize_as_(Tensor & self, const Tensor & the_template, c10::optional<MemoryFormat> memory_format=c10::nullopt);
 CAFFE2_API Tensor & pow_out(Tensor & out, const Tensor & self, Scalar exponent);
 CAFFE2_API Tensor & pow_out_sparse_scalar(Tensor & out, const Tensor & self, Scalar exponent);
 CAFFE2_API Tensor pow(const Tensor & self, Scalar exponent);
 CAFFE2_API Tensor pow_sparse_scalar(const Tensor & self, Scalar exponent);
 CAFFE2_API Tensor & zero_(Tensor & self);
-CAFFE2_API Tensor & mkldnn_zero_(Tensor & self);
 CAFFE2_API Tensor & zero_sparse_(Tensor & self);
+CAFFE2_API Tensor & mkldnn_zero_(Tensor & self);
 CAFFE2_API Tensor & sub_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & sub_out_sparse(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor sub(const Tensor & self, const Tensor & other, Scalar alpha=1);
@@ -1053,8 +1053,8 @@ CAFFE2_API Tensor & sparse_resize_(Tensor & self, IntArrayRef size, int64_t spar
 CAFFE2_API Tensor & sparse_resize_and_clear_(Tensor & self, IntArrayRef size, int64_t sparse_dim, int64_t dense_dim);
 CAFFE2_API Tensor sparse_mask_cpu(const Tensor & self, const Tensor & mask);
 CAFFE2_API Tensor sparse_mask_cuda(const Tensor & self, const Tensor & mask);
-CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self);
 CAFFE2_API Tensor sparse_to_dense(const Tensor & self);
+CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self);
 CAFFE2_API Tensor to_dense_backward(const Tensor & grad, const Tensor & input);
 CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self);
 CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self);
```

These are just wobbles in the order of the declarations; I couldn't be bothered to figure out exactly how the old codegen did the ordering.

Signed-off-by: Edward Z. Yang <[email protected]>

Differential Revision: [D23183978](https://our.internmc.facebook.com/intern/diff/D23183978)

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Aug 20, 2020
How to approach reviewing this diff:

- The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen.
- The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`.
- All of the inputs to the old codegen are deleted.
- Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI.
- LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port.

Signed-off-by: Edward Z. Yang <[email protected]>

ghstack-source-id: 0c4c53b1264bff372300febe4bb9b79f5c83a212
Pull Request resolved: #42629
How to approach reviewing this diff:

- The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen.
- The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`.
- All of the inputs to the old codegen are deleted.
- Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI.
- LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port. I kept the CUDA header in ATen/ to avoid having to fix a bunch of headers.

This diff cannot be currently landed as it doesn't reimplement static dispatch.

How do we know that this diff is right? We aimed for byte-for-byte modulo whitespace compatibility with the old generated code. Apply the following patch (to remove static dispatch) to the base version of PyTorch:

```
diff --git a/aten/src/ATen/function_wrapper.py b/aten/src/ATen/function_wrapper.py
index e26bb3941b..334475212b 100644
--- a/aten/src/ATen/function_wrapper.py
+++ b/aten/src/ATen/function_wrapper.py
@@ -147,7 +147,6 @@ TENSOR_METHOD_DEFINITION = CodeTemplate("""\
 // ${schema_string}
 ${return_type} Tensor::${api_name}(${method_formals}) const {
 #ifdef USE_STATIC_DISPATCH
-    ${static_dispatch_method_body}
 #else
     static auto op = c10::Dispatcher::singleton()
         .findSchemaOrThrow("aten::${operator_name}", "${overload_name}")
@@ -173,7 +172,6 @@ FUNCTION_DEFINITION = CodeTemplate("""\
 // ${schema_string}
 ${return_type} ${api_name}(${formals}) {
 #ifdef USE_STATIC_DISPATCH
-    ${static_dispatch_function_body}
 #else
     static auto op = c10::Dispatcher::singleton()
         .findSchemaOrThrow("aten::${operator_name}", "${overload_name}")
```

and then we generate the old and new versions and diff them:

```
 {build-old => build}/aten/src/ATen/BackendSelectRegister.cpp                                 |    0
 {build-old => build}/aten/src/ATen/CPUType.cpp                                               |    0
 {build-old => build}/aten/src/ATen/CUDAType.cpp                                              |    0
 {build-old => build}/aten/src/ATen/CUDAType.h                                                |    0
 build-old/aten/src/ATen/LegacyTHFunctionsCPU.cpp => /dev/null                                | 1712 -------------------
 build-old/aten/src/ATen/LegacyTHFunctionsCPU.h => /dev/null                                  |   67 -
 build-old/aten/src/ATen/LegacyTHFunctionsCUDA.cpp => /dev/null                               | 4176 ---------------------------------------------
 build-old/aten/src/ATen/LegacyTHFunctionsCUDA.h => /dev/null                                 |  111 --
 {build-old => build}/aten/src/ATen/MkldnnCPUType.cpp                                         |    0
 {build-old => build}/aten/src/ATen/NativeFunctions.h                                         |   20 +-
 {build-old => build}/aten/src/ATen/QuantizedCPUType.cpp                                      |    0
 {build-old => build}/aten/src/ATen/QuantizedCUDAType.cpp                                     |    0
 {build-old => build}/aten/src/ATen/QuantizedCUDAType.h                                       |    0
 {build-old => build}/aten/src/ATen/SparseCPUType.cpp                                         |    0
 {build-old => build}/aten/src/ATen/SparseCUDAType.cpp                                        |    0
 {build-old => build}/aten/src/ATen/SparseCUDAType.h                                          |    0
 {build-old => build}/aten/src/ATen/TypeDefault.cpp                                           |    0
 {build-old => build}/aten/src/ATen/core/ATenOpList.cpp                                       |    0
```

The only diff is this:

```
 diff --git a/build-old/aten/src/ATen/NativeFunctions.h b/build-new/aten/src/ATen/NativeFunctions.h
index a0463dc80d..3808d27824 100644
--- a/build-old/aten/src/ATen/NativeFunctions.h
+++ b/build-new/aten/src/ATen/NativeFunctions.h
@@ -116,15 +116,15 @@ CAFFE2_API Tensor avg_pool1d(const Tensor & self, IntArrayRef kernel_size, IntAr
 CAFFE2_API Tensor adaptive_avg_pool1d(const Tensor & self, IntArrayRef output_size);
 CAFFE2_API std::tuple<Tensor,Tensor> adaptive_max_pool1d(const Tensor & self, IntArrayRef output_size);
 CAFFE2_API Tensor add(const Tensor & self, const Tensor & other, Scalar alpha=1);
-CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor add_sparse(const Tensor & self, const Tensor & other, Scalar alpha=1);
+CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_(Tensor & self, const Tensor & other, Scalar alpha=1);
-CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_sparse_(Tensor & self, const Tensor & other, Scalar alpha=1);
+CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
-CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
+CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor add_relu(const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_relu_(Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_relu_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
@@ -639,15 +639,15 @@ CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indi
 CAFFE2_API std::tuple<Tensor,Tensor> mode(const Tensor & self, Dimname dim, bool keepdim=false);
 CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indices, const Tensor & self, Dimname dim, bool keepdim=false);
 CAFFE2_API Tensor mul(const Tensor & self, const Tensor & other);
-CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor mul_sparse(const Tensor & self, const Tensor & other);
+CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_(Tensor & self, const Tensor & other);
-CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_sparse_(Tensor & self, const Tensor & other);
+CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_out(Tensor & out, const Tensor & self, const Tensor & other);
-CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other);
+CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor mul(const Tensor & self, Scalar other);
 CAFFE2_API Tensor & mul_(Tensor & self, Scalar other);
 CAFFE2_API Tensor mv(const Tensor & self, const Tensor & vec);
@@ -793,8 +793,8 @@ CAFFE2_API Tensor & silu_(Tensor & self);
 CAFFE2_API Tensor & silu_out(Tensor & out, const Tensor & self);
 CAFFE2_API Tensor silu_backward(const Tensor & grad_output, const Tensor & self);
 CAFFE2_API Tensor sigmoid(const Tensor & self);
-CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self);
 CAFFE2_API Tensor sigmoid_quantized_cpu(const Tensor & self);
+CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self);
 CAFFE2_API Tensor & sigmoid_(Tensor & self);
 CAFFE2_API Tensor & mkldnn_sigmoid_(Tensor & self);
 CAFFE2_API Tensor & sigmoid_out(Tensor & out, const Tensor & self);
@@ -1008,17 +1008,17 @@ CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, bool kee
 CAFFE2_API Tensor nuclear_norm(const Tensor & self, IntArrayRef dim, bool keepdim=false);
 CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, IntArrayRef dim, bool keepdim=false);
 CAFFE2_API Tensor clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
+CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
 CAFFE2_API Tensor mkldnn_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
 CAFFE2_API Tensor quantized_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
-CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
 CAFFE2_API Tensor & resize_as_(Tensor & self, const Tensor & the_template, c10::optional<MemoryFormat> memory_format=c10::nullopt);
 CAFFE2_API Tensor & pow_out(Tensor & out, const Tensor & self, Scalar exponent);
 CAFFE2_API Tensor & pow_out_sparse_scalar(Tensor & out, const Tensor & self, Scalar exponent);
 CAFFE2_API Tensor pow(const Tensor & self, Scalar exponent);
 CAFFE2_API Tensor pow_sparse_scalar(const Tensor & self, Scalar exponent);
 CAFFE2_API Tensor & zero_(Tensor & self);
-CAFFE2_API Tensor & mkldnn_zero_(Tensor & self);
 CAFFE2_API Tensor & zero_sparse_(Tensor & self);
+CAFFE2_API Tensor & mkldnn_zero_(Tensor & self);
 CAFFE2_API Tensor & sub_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & sub_out_sparse(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor sub(const Tensor & self, const Tensor & other, Scalar alpha=1);
@@ -1053,8 +1053,8 @@ CAFFE2_API Tensor & sparse_resize_(Tensor & self, IntArrayRef size, int64_t spar
 CAFFE2_API Tensor & sparse_resize_and_clear_(Tensor & self, IntArrayRef size, int64_t sparse_dim, int64_t dense_dim);
 CAFFE2_API Tensor sparse_mask_cpu(const Tensor & self, const Tensor & mask);
 CAFFE2_API Tensor sparse_mask_cuda(const Tensor & self, const Tensor & mask);
-CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self);
 CAFFE2_API Tensor sparse_to_dense(const Tensor & self);
+CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self);
 CAFFE2_API Tensor to_dense_backward(const Tensor & grad, const Tensor & input);
 CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self);
 CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self);
```

These are just wobbles in the order of the declarations; I couldn't be bothered to figure out exactly how the old codegen did the ordering.

Signed-off-by: Edward Z. Yang <[email protected]>

Differential Revision: [D23183978](https://our.internmc.facebook.com/intern/diff/D23183978)

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Aug 25, 2020
How to approach reviewing this diff:

- The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen.
- The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`.
- All of the inputs to the old codegen are deleted.
- Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI.
- LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port.

Signed-off-by: Edward Z. Yang <[email protected]>

ghstack-source-id: ce505828e3425dc5e162d94b4a6f0ebeb897c352
Pull Request resolved: #42629
@ezyang
Copy link
Contributor Author

ezyang commented Aug 25, 2020

Updated for int[2]? optional support added in #43262

Copy link

@bhosmer bhosmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice piece of work. A few ideas inline but nothing really pressing.

from contextlib import contextmanager
from typing import Optional, Iterator

# Simple dynamic scoping implementation. The name "parametrize" comes
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm gonna go ahead and be no fun and say I wish we could do without this. It's cute and avoids some plumbing but a) it's way trickier than ideally I'd want a codegen script to be and b) anyway I like plumbing, it makes code look more like what it does and reduces the indirection needed to figure out what's going on.

Imagining what this could turn into over time is probably coloring my reaction, it feels like it makes innocuously adding complexity to the global state uncomfortably easy.

You have a better feel for what it's saved on the writing side, so if you feel like the code-writer-vs-reader cost/benefit works, I wouldn't push too hard on it. But if so, if you can think of a way to make it harder for a bunch of hack_* flags to quietly accumulate in here, even maybe just a finger-wagging comment, I think that'd be worth doing.

How to approach reviewing this diff:

- The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen.
- The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`.
- All of the inputs to the old codegen are deleted.
- Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI.
- LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port. I kept the CUDA header in ATen/ to avoid having to fix a bunch of headers.

This diff cannot be currently landed as it doesn't reimplement static dispatch.

How do we know that this diff is right? We aimed for byte-for-byte modulo whitespace compatibility with the old generated code. Apply the following patch (to remove static dispatch) to the base version of PyTorch:

```
diff --git a/aten/src/ATen/function_wrapper.py b/aten/src/ATen/function_wrapper.py
index e26bb3941b..334475212b 100644
--- a/aten/src/ATen/function_wrapper.py
+++ b/aten/src/ATen/function_wrapper.py
@@ -147,7 +147,6 @@ TENSOR_METHOD_DEFINITION = CodeTemplate("""\
 // ${schema_string}
 ${return_type} Tensor::${api_name}(${method_formals}) const {
 #ifdef USE_STATIC_DISPATCH
-    ${static_dispatch_method_body}
 #else
     static auto op = c10::Dispatcher::singleton()
         .findSchemaOrThrow("aten::${operator_name}", "${overload_name}")
@@ -173,7 +172,6 @@ FUNCTION_DEFINITION = CodeTemplate("""\
 // ${schema_string}
 ${return_type} ${api_name}(${formals}) {
 #ifdef USE_STATIC_DISPATCH
-    ${static_dispatch_function_body}
 #else
     static auto op = c10::Dispatcher::singleton()
         .findSchemaOrThrow("aten::${operator_name}", "${overload_name}")
```

and then we generate the old and new versions and diff them:

```
 {build-old => build}/aten/src/ATen/BackendSelectRegister.cpp                                 |    0
 {build-old => build}/aten/src/ATen/CPUType.cpp                                               |    0
 {build-old => build}/aten/src/ATen/CUDAType.cpp                                              |    0
 {build-old => build}/aten/src/ATen/CUDAType.h                                                |    0
 build-old/aten/src/ATen/LegacyTHFunctionsCPU.cpp => /dev/null                                | 1712 -------------------
 build-old/aten/src/ATen/LegacyTHFunctionsCPU.h => /dev/null                                  |   67 -
 build-old/aten/src/ATen/LegacyTHFunctionsCUDA.cpp => /dev/null                               | 4176 ---------------------------------------------
 build-old/aten/src/ATen/LegacyTHFunctionsCUDA.h => /dev/null                                 |  111 --
 {build-old => build}/aten/src/ATen/MkldnnCPUType.cpp                                         |    0
 {build-old => build}/aten/src/ATen/NativeFunctions.h                                         |   20 +-
 {build-old => build}/aten/src/ATen/QuantizedCPUType.cpp                                      |    0
 {build-old => build}/aten/src/ATen/QuantizedCUDAType.cpp                                     |    0
 {build-old => build}/aten/src/ATen/QuantizedCUDAType.h                                       |    0
 {build-old => build}/aten/src/ATen/SparseCPUType.cpp                                         |    0
 {build-old => build}/aten/src/ATen/SparseCUDAType.cpp                                        |    0
 {build-old => build}/aten/src/ATen/SparseCUDAType.h                                          |    0
 {build-old => build}/aten/src/ATen/TypeDefault.cpp                                           |    0
 {build-old => build}/aten/src/ATen/core/ATenOpList.cpp                                       |    0
```

The only diff is this:

```
 diff --git a/build-old/aten/src/ATen/NativeFunctions.h b/build-new/aten/src/ATen/NativeFunctions.h
index a0463dc80d..3808d27824 100644
--- a/build-old/aten/src/ATen/NativeFunctions.h
+++ b/build-new/aten/src/ATen/NativeFunctions.h
@@ -116,15 +116,15 @@ CAFFE2_API Tensor avg_pool1d(const Tensor & self, IntArrayRef kernel_size, IntAr
 CAFFE2_API Tensor adaptive_avg_pool1d(const Tensor & self, IntArrayRef output_size);
 CAFFE2_API std::tuple<Tensor,Tensor> adaptive_max_pool1d(const Tensor & self, IntArrayRef output_size);
 CAFFE2_API Tensor add(const Tensor & self, const Tensor & other, Scalar alpha=1);
-CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor add_sparse(const Tensor & self, const Tensor & other, Scalar alpha=1);
+CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_(Tensor & self, const Tensor & other, Scalar alpha=1);
-CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_sparse_(Tensor & self, const Tensor & other, Scalar alpha=1);
+CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
-CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
+CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor add_relu(const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_relu_(Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_relu_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
@@ -639,15 +639,15 @@ CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indi
 CAFFE2_API std::tuple<Tensor,Tensor> mode(const Tensor & self, Dimname dim, bool keepdim=false);
 CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indices, const Tensor & self, Dimname dim, bool keepdim=false);
 CAFFE2_API Tensor mul(const Tensor & self, const Tensor & other);
-CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor mul_sparse(const Tensor & self, const Tensor & other);
+CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_(Tensor & self, const Tensor & other);
-CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_sparse_(Tensor & self, const Tensor & other);
+CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_out(Tensor & out, const Tensor & self, const Tensor & other);
-CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other);
+CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor mul(const Tensor & self, Scalar other);
 CAFFE2_API Tensor & mul_(Tensor & self, Scalar other);
 CAFFE2_API Tensor mv(const Tensor & self, const Tensor & vec);
@@ -793,8 +793,8 @@ CAFFE2_API Tensor & silu_(Tensor & self);
 CAFFE2_API Tensor & silu_out(Tensor & out, const Tensor & self);
 CAFFE2_API Tensor silu_backward(const Tensor & grad_output, const Tensor & self);
 CAFFE2_API Tensor sigmoid(const Tensor & self);
-CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self);
 CAFFE2_API Tensor sigmoid_quantized_cpu(const Tensor & self);
+CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self);
 CAFFE2_API Tensor & sigmoid_(Tensor & self);
 CAFFE2_API Tensor & mkldnn_sigmoid_(Tensor & self);
 CAFFE2_API Tensor & sigmoid_out(Tensor & out, const Tensor & self);
@@ -1008,17 +1008,17 @@ CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, bool kee
 CAFFE2_API Tensor nuclear_norm(const Tensor & self, IntArrayRef dim, bool keepdim=false);
 CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, IntArrayRef dim, bool keepdim=false);
 CAFFE2_API Tensor clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
+CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
 CAFFE2_API Tensor mkldnn_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
 CAFFE2_API Tensor quantized_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
-CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
 CAFFE2_API Tensor & resize_as_(Tensor & self, const Tensor & the_template, c10::optional<MemoryFormat> memory_format=c10::nullopt);
 CAFFE2_API Tensor & pow_out(Tensor & out, const Tensor & self, Scalar exponent);
 CAFFE2_API Tensor & pow_out_sparse_scalar(Tensor & out, const Tensor & self, Scalar exponent);
 CAFFE2_API Tensor pow(const Tensor & self, Scalar exponent);
 CAFFE2_API Tensor pow_sparse_scalar(const Tensor & self, Scalar exponent);
 CAFFE2_API Tensor & zero_(Tensor & self);
-CAFFE2_API Tensor & mkldnn_zero_(Tensor & self);
 CAFFE2_API Tensor & zero_sparse_(Tensor & self);
+CAFFE2_API Tensor & mkldnn_zero_(Tensor & self);
 CAFFE2_API Tensor & sub_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & sub_out_sparse(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor sub(const Tensor & self, const Tensor & other, Scalar alpha=1);
@@ -1053,8 +1053,8 @@ CAFFE2_API Tensor & sparse_resize_(Tensor & self, IntArrayRef size, int64_t spar
 CAFFE2_API Tensor & sparse_resize_and_clear_(Tensor & self, IntArrayRef size, int64_t sparse_dim, int64_t dense_dim);
 CAFFE2_API Tensor sparse_mask_cpu(const Tensor & self, const Tensor & mask);
 CAFFE2_API Tensor sparse_mask_cuda(const Tensor & self, const Tensor & mask);
-CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self);
 CAFFE2_API Tensor sparse_to_dense(const Tensor & self);
+CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self);
 CAFFE2_API Tensor to_dense_backward(const Tensor & grad, const Tensor & input);
 CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self);
 CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self);
```

These are just wobbles in the order of the declarations; I couldn't be bothered to figure out exactly how the old codegen did the ordering.

Signed-off-by: Edward Z. Yang <[email protected]>

Differential Revision: [D23183978](https://our.internmc.facebook.com/intern/diff/D23183978)

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Aug 27, 2020
How to approach reviewing this diff:

- The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen.
- The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`.
- All of the inputs to the old codegen are deleted.
- Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI.
- LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port.

Signed-off-by: Edward Z. Yang <[email protected]>

ghstack-source-id: bddc3c8107ce200cca339806c6eea6fd368744ed
Pull Request resolved: #42629
How to approach reviewing this diff:

- The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen.
- The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`.
- All of the inputs to the old codegen are deleted.
- Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI.
- LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port. I kept the CUDA header in ATen/ to avoid having to fix a bunch of headers.

This diff cannot be currently landed as it doesn't reimplement static dispatch.

How do we know that this diff is right? We aimed for byte-for-byte modulo whitespace compatibility with the old generated code. Apply the following patch (to remove static dispatch) to the base version of PyTorch:

```
diff --git a/aten/src/ATen/function_wrapper.py b/aten/src/ATen/function_wrapper.py
index e26bb3941b..334475212b 100644
--- a/aten/src/ATen/function_wrapper.py
+++ b/aten/src/ATen/function_wrapper.py
@@ -147,7 +147,6 @@ TENSOR_METHOD_DEFINITION = CodeTemplate("""\
 // ${schema_string}
 ${return_type} Tensor::${api_name}(${method_formals}) const {
 #ifdef USE_STATIC_DISPATCH
-    ${static_dispatch_method_body}
 #else
     static auto op = c10::Dispatcher::singleton()
         .findSchemaOrThrow("aten::${operator_name}", "${overload_name}")
@@ -173,7 +172,6 @@ FUNCTION_DEFINITION = CodeTemplate("""\
 // ${schema_string}
 ${return_type} ${api_name}(${formals}) {
 #ifdef USE_STATIC_DISPATCH
-    ${static_dispatch_function_body}
 #else
     static auto op = c10::Dispatcher::singleton()
         .findSchemaOrThrow("aten::${operator_name}", "${overload_name}")
```

and then we generate the old and new versions and diff them:

```
 {build-old => build}/aten/src/ATen/BackendSelectRegister.cpp                                 |    0
 {build-old => build}/aten/src/ATen/CPUType.cpp                                               |    0
 {build-old => build}/aten/src/ATen/CUDAType.cpp                                              |    0
 {build-old => build}/aten/src/ATen/CUDAType.h                                                |    0
 build-old/aten/src/ATen/LegacyTHFunctionsCPU.cpp => /dev/null                                | 1712 -------------------
 build-old/aten/src/ATen/LegacyTHFunctionsCPU.h => /dev/null                                  |   67 -
 build-old/aten/src/ATen/LegacyTHFunctionsCUDA.cpp => /dev/null                               | 4176 ---------------------------------------------
 build-old/aten/src/ATen/LegacyTHFunctionsCUDA.h => /dev/null                                 |  111 --
 {build-old => build}/aten/src/ATen/MkldnnCPUType.cpp                                         |    0
 {build-old => build}/aten/src/ATen/NativeFunctions.h                                         |   20 +-
 {build-old => build}/aten/src/ATen/QuantizedCPUType.cpp                                      |    0
 {build-old => build}/aten/src/ATen/QuantizedCUDAType.cpp                                     |    0
 {build-old => build}/aten/src/ATen/QuantizedCUDAType.h                                       |    0
 {build-old => build}/aten/src/ATen/SparseCPUType.cpp                                         |    0
 {build-old => build}/aten/src/ATen/SparseCUDAType.cpp                                        |    0
 {build-old => build}/aten/src/ATen/SparseCUDAType.h                                          |    0
 {build-old => build}/aten/src/ATen/TypeDefault.cpp                                           |    0
 {build-old => build}/aten/src/ATen/core/ATenOpList.cpp                                       |    0
```

The only diff is this:

```
 diff --git a/build-old/aten/src/ATen/NativeFunctions.h b/build-new/aten/src/ATen/NativeFunctions.h
index a0463dc80d..3808d27824 100644
--- a/build-old/aten/src/ATen/NativeFunctions.h
+++ b/build-new/aten/src/ATen/NativeFunctions.h
@@ -116,15 +116,15 @@ CAFFE2_API Tensor avg_pool1d(const Tensor & self, IntArrayRef kernel_size, IntAr
 CAFFE2_API Tensor adaptive_avg_pool1d(const Tensor & self, IntArrayRef output_size);
 CAFFE2_API std::tuple<Tensor,Tensor> adaptive_max_pool1d(const Tensor & self, IntArrayRef output_size);
 CAFFE2_API Tensor add(const Tensor & self, const Tensor & other, Scalar alpha=1);
-CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor add_sparse(const Tensor & self, const Tensor & other, Scalar alpha=1);
+CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_(Tensor & self, const Tensor & other, Scalar alpha=1);
-CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_sparse_(Tensor & self, const Tensor & other, Scalar alpha=1);
+CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
-CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
+CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor add_relu(const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_relu_(Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_relu_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
@@ -639,15 +639,15 @@ CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indi
 CAFFE2_API std::tuple<Tensor,Tensor> mode(const Tensor & self, Dimname dim, bool keepdim=false);
 CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indices, const Tensor & self, Dimname dim, bool keepdim=false);
 CAFFE2_API Tensor mul(const Tensor & self, const Tensor & other);
-CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor mul_sparse(const Tensor & self, const Tensor & other);
+CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_(Tensor & self, const Tensor & other);
-CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_sparse_(Tensor & self, const Tensor & other);
+CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_out(Tensor & out, const Tensor & self, const Tensor & other);
-CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other);
+CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor mul(const Tensor & self, Scalar other);
 CAFFE2_API Tensor & mul_(Tensor & self, Scalar other);
 CAFFE2_API Tensor mv(const Tensor & self, const Tensor & vec);
@@ -793,8 +793,8 @@ CAFFE2_API Tensor & silu_(Tensor & self);
 CAFFE2_API Tensor & silu_out(Tensor & out, const Tensor & self);
 CAFFE2_API Tensor silu_backward(const Tensor & grad_output, const Tensor & self);
 CAFFE2_API Tensor sigmoid(const Tensor & self);
-CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self);
 CAFFE2_API Tensor sigmoid_quantized_cpu(const Tensor & self);
+CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self);
 CAFFE2_API Tensor & sigmoid_(Tensor & self);
 CAFFE2_API Tensor & mkldnn_sigmoid_(Tensor & self);
 CAFFE2_API Tensor & sigmoid_out(Tensor & out, const Tensor & self);
@@ -1008,17 +1008,17 @@ CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, bool kee
 CAFFE2_API Tensor nuclear_norm(const Tensor & self, IntArrayRef dim, bool keepdim=false);
 CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, IntArrayRef dim, bool keepdim=false);
 CAFFE2_API Tensor clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
+CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
 CAFFE2_API Tensor mkldnn_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
 CAFFE2_API Tensor quantized_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
-CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
 CAFFE2_API Tensor & resize_as_(Tensor & self, const Tensor & the_template, c10::optional<MemoryFormat> memory_format=c10::nullopt);
 CAFFE2_API Tensor & pow_out(Tensor & out, const Tensor & self, Scalar exponent);
 CAFFE2_API Tensor & pow_out_sparse_scalar(Tensor & out, const Tensor & self, Scalar exponent);
 CAFFE2_API Tensor pow(const Tensor & self, Scalar exponent);
 CAFFE2_API Tensor pow_sparse_scalar(const Tensor & self, Scalar exponent);
 CAFFE2_API Tensor & zero_(Tensor & self);
-CAFFE2_API Tensor & mkldnn_zero_(Tensor & self);
 CAFFE2_API Tensor & zero_sparse_(Tensor & self);
+CAFFE2_API Tensor & mkldnn_zero_(Tensor & self);
 CAFFE2_API Tensor & sub_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & sub_out_sparse(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor sub(const Tensor & self, const Tensor & other, Scalar alpha=1);
@@ -1053,8 +1053,8 @@ CAFFE2_API Tensor & sparse_resize_(Tensor & self, IntArrayRef size, int64_t spar
 CAFFE2_API Tensor & sparse_resize_and_clear_(Tensor & self, IntArrayRef size, int64_t sparse_dim, int64_t dense_dim);
 CAFFE2_API Tensor sparse_mask_cpu(const Tensor & self, const Tensor & mask);
 CAFFE2_API Tensor sparse_mask_cuda(const Tensor & self, const Tensor & mask);
-CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self);
 CAFFE2_API Tensor sparse_to_dense(const Tensor & self);
+CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self);
 CAFFE2_API Tensor to_dense_backward(const Tensor & grad, const Tensor & input);
 CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self);
 CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self);
```

These are just wobbles in the order of the declarations; I couldn't be bothered to figure out exactly how the old codegen did the ordering.

Signed-off-by: Edward Z. Yang <[email protected]>

Differential Revision: [D23183978](https://our.internmc.facebook.com/intern/diff/D23183978)

[ghstack-poisoned]
ezyang added 2 commits August 27, 2020 14:22
How to approach reviewing this diff:

- The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen.
- The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`.
- All of the inputs to the old codegen are deleted.
- Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI.
- LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port. I kept the CUDA header in ATen/ to avoid having to fix a bunch of headers.

This diff cannot be currently landed as it doesn't reimplement static dispatch.

How do we know that this diff is right? We aimed for byte-for-byte modulo whitespace compatibility with the old generated code. Apply the following patch (to remove static dispatch) to the base version of PyTorch:

```
diff --git a/aten/src/ATen/function_wrapper.py b/aten/src/ATen/function_wrapper.py
index e26bb3941b..334475212b 100644
--- a/aten/src/ATen/function_wrapper.py
+++ b/aten/src/ATen/function_wrapper.py
@@ -147,7 +147,6 @@ TENSOR_METHOD_DEFINITION = CodeTemplate("""\
 // ${schema_string}
 ${return_type} Tensor::${api_name}(${method_formals}) const {
 #ifdef USE_STATIC_DISPATCH
-    ${static_dispatch_method_body}
 #else
     static auto op = c10::Dispatcher::singleton()
         .findSchemaOrThrow("aten::${operator_name}", "${overload_name}")
@@ -173,7 +172,6 @@ FUNCTION_DEFINITION = CodeTemplate("""\
 // ${schema_string}
 ${return_type} ${api_name}(${formals}) {
 #ifdef USE_STATIC_DISPATCH
-    ${static_dispatch_function_body}
 #else
     static auto op = c10::Dispatcher::singleton()
         .findSchemaOrThrow("aten::${operator_name}", "${overload_name}")
```

and then we generate the old and new versions and diff them:

```
 {build-old => build}/aten/src/ATen/BackendSelectRegister.cpp                                 |    0
 {build-old => build}/aten/src/ATen/CPUType.cpp                                               |    0
 {build-old => build}/aten/src/ATen/CUDAType.cpp                                              |    0
 {build-old => build}/aten/src/ATen/CUDAType.h                                                |    0
 build-old/aten/src/ATen/LegacyTHFunctionsCPU.cpp => /dev/null                                | 1712 -------------------
 build-old/aten/src/ATen/LegacyTHFunctionsCPU.h => /dev/null                                  |   67 -
 build-old/aten/src/ATen/LegacyTHFunctionsCUDA.cpp => /dev/null                               | 4176 ---------------------------------------------
 build-old/aten/src/ATen/LegacyTHFunctionsCUDA.h => /dev/null                                 |  111 --
 {build-old => build}/aten/src/ATen/MkldnnCPUType.cpp                                         |    0
 {build-old => build}/aten/src/ATen/NativeFunctions.h                                         |   20 +-
 {build-old => build}/aten/src/ATen/QuantizedCPUType.cpp                                      |    0
 {build-old => build}/aten/src/ATen/QuantizedCUDAType.cpp                                     |    0
 {build-old => build}/aten/src/ATen/QuantizedCUDAType.h                                       |    0
 {build-old => build}/aten/src/ATen/SparseCPUType.cpp                                         |    0
 {build-old => build}/aten/src/ATen/SparseCUDAType.cpp                                        |    0
 {build-old => build}/aten/src/ATen/SparseCUDAType.h                                          |    0
 {build-old => build}/aten/src/ATen/TypeDefault.cpp                                           |    0
 {build-old => build}/aten/src/ATen/core/ATenOpList.cpp                                       |    0
```

The only diff is this:

```
 diff --git a/build-old/aten/src/ATen/NativeFunctions.h b/build-new/aten/src/ATen/NativeFunctions.h
index a0463dc80d..3808d27824 100644
--- a/build-old/aten/src/ATen/NativeFunctions.h
+++ b/build-new/aten/src/ATen/NativeFunctions.h
@@ -116,15 +116,15 @@ CAFFE2_API Tensor avg_pool1d(const Tensor & self, IntArrayRef kernel_size, IntAr
 CAFFE2_API Tensor adaptive_avg_pool1d(const Tensor & self, IntArrayRef output_size);
 CAFFE2_API std::tuple<Tensor,Tensor> adaptive_max_pool1d(const Tensor & self, IntArrayRef output_size);
 CAFFE2_API Tensor add(const Tensor & self, const Tensor & other, Scalar alpha=1);
-CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor add_sparse(const Tensor & self, const Tensor & other, Scalar alpha=1);
+CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_(Tensor & self, const Tensor & other, Scalar alpha=1);
-CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_sparse_(Tensor & self, const Tensor & other, Scalar alpha=1);
+CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
-CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
+CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor add_relu(const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_relu_(Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_relu_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
@@ -639,15 +639,15 @@ CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indi
 CAFFE2_API std::tuple<Tensor,Tensor> mode(const Tensor & self, Dimname dim, bool keepdim=false);
 CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indices, const Tensor & self, Dimname dim, bool keepdim=false);
 CAFFE2_API Tensor mul(const Tensor & self, const Tensor & other);
-CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor mul_sparse(const Tensor & self, const Tensor & other);
+CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_(Tensor & self, const Tensor & other);
-CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_sparse_(Tensor & self, const Tensor & other);
+CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_out(Tensor & out, const Tensor & self, const Tensor & other);
-CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other);
+CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor mul(const Tensor & self, Scalar other);
 CAFFE2_API Tensor & mul_(Tensor & self, Scalar other);
 CAFFE2_API Tensor mv(const Tensor & self, const Tensor & vec);
@@ -793,8 +793,8 @@ CAFFE2_API Tensor & silu_(Tensor & self);
 CAFFE2_API Tensor & silu_out(Tensor & out, const Tensor & self);
 CAFFE2_API Tensor silu_backward(const Tensor & grad_output, const Tensor & self);
 CAFFE2_API Tensor sigmoid(const Tensor & self);
-CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self);
 CAFFE2_API Tensor sigmoid_quantized_cpu(const Tensor & self);
+CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self);
 CAFFE2_API Tensor & sigmoid_(Tensor & self);
 CAFFE2_API Tensor & mkldnn_sigmoid_(Tensor & self);
 CAFFE2_API Tensor & sigmoid_out(Tensor & out, const Tensor & self);
@@ -1008,17 +1008,17 @@ CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, bool kee
 CAFFE2_API Tensor nuclear_norm(const Tensor & self, IntArrayRef dim, bool keepdim=false);
 CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, IntArrayRef dim, bool keepdim=false);
 CAFFE2_API Tensor clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
+CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
 CAFFE2_API Tensor mkldnn_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
 CAFFE2_API Tensor quantized_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
-CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
 CAFFE2_API Tensor & resize_as_(Tensor & self, const Tensor & the_template, c10::optional<MemoryFormat> memory_format=c10::nullopt);
 CAFFE2_API Tensor & pow_out(Tensor & out, const Tensor & self, Scalar exponent);
 CAFFE2_API Tensor & pow_out_sparse_scalar(Tensor & out, const Tensor & self, Scalar exponent);
 CAFFE2_API Tensor pow(const Tensor & self, Scalar exponent);
 CAFFE2_API Tensor pow_sparse_scalar(const Tensor & self, Scalar exponent);
 CAFFE2_API Tensor & zero_(Tensor & self);
-CAFFE2_API Tensor & mkldnn_zero_(Tensor & self);
 CAFFE2_API Tensor & zero_sparse_(Tensor & self);
+CAFFE2_API Tensor & mkldnn_zero_(Tensor & self);
 CAFFE2_API Tensor & sub_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & sub_out_sparse(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor sub(const Tensor & self, const Tensor & other, Scalar alpha=1);
@@ -1053,8 +1053,8 @@ CAFFE2_API Tensor & sparse_resize_(Tensor & self, IntArrayRef size, int64_t spar
 CAFFE2_API Tensor & sparse_resize_and_clear_(Tensor & self, IntArrayRef size, int64_t sparse_dim, int64_t dense_dim);
 CAFFE2_API Tensor sparse_mask_cpu(const Tensor & self, const Tensor & mask);
 CAFFE2_API Tensor sparse_mask_cuda(const Tensor & self, const Tensor & mask);
-CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self);
 CAFFE2_API Tensor sparse_to_dense(const Tensor & self);
+CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self);
 CAFFE2_API Tensor to_dense_backward(const Tensor & grad, const Tensor & input);
 CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self);
 CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self);
```

These are just wobbles in the order of the declarations; I couldn't be bothered to figure out exactly how the old codegen did the ordering.

Signed-off-by: Edward Z. Yang <[email protected]>

Differential Revision: [D23183978](https://our.internmc.facebook.com/intern/diff/D23183978)

[ghstack-poisoned]
How to approach reviewing this diff:

- The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen.
- The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`.
- All of the inputs to the old codegen are deleted.
- Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI.
- LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port. I kept the CUDA header in ATen/ to avoid having to fix a bunch of headers.

This diff cannot be currently landed as it doesn't reimplement static dispatch.

How do we know that this diff is right? We aimed for byte-for-byte modulo whitespace compatibility with the old generated code. Apply the following patch (to remove static dispatch) to the base version of PyTorch:

```
diff --git a/aten/src/ATen/function_wrapper.py b/aten/src/ATen/function_wrapper.py
index e26bb3941b..334475212b 100644
--- a/aten/src/ATen/function_wrapper.py
+++ b/aten/src/ATen/function_wrapper.py
@@ -147,7 +147,6 @@ TENSOR_METHOD_DEFINITION = CodeTemplate("""\
 // ${schema_string}
 ${return_type} Tensor::${api_name}(${method_formals}) const {
 #ifdef USE_STATIC_DISPATCH
-    ${static_dispatch_method_body}
 #else
     static auto op = c10::Dispatcher::singleton()
         .findSchemaOrThrow("aten::${operator_name}", "${overload_name}")
@@ -173,7 +172,6 @@ FUNCTION_DEFINITION = CodeTemplate("""\
 // ${schema_string}
 ${return_type} ${api_name}(${formals}) {
 #ifdef USE_STATIC_DISPATCH
-    ${static_dispatch_function_body}
 #else
     static auto op = c10::Dispatcher::singleton()
         .findSchemaOrThrow("aten::${operator_name}", "${overload_name}")
```

and then we generate the old and new versions and diff them:

```
 {build-old => build}/aten/src/ATen/BackendSelectRegister.cpp                                 |    0
 {build-old => build}/aten/src/ATen/CPUType.cpp                                               |    0
 {build-old => build}/aten/src/ATen/CUDAType.cpp                                              |    0
 {build-old => build}/aten/src/ATen/CUDAType.h                                                |    0
 build-old/aten/src/ATen/LegacyTHFunctionsCPU.cpp => /dev/null                                | 1712 -------------------
 build-old/aten/src/ATen/LegacyTHFunctionsCPU.h => /dev/null                                  |   67 -
 build-old/aten/src/ATen/LegacyTHFunctionsCUDA.cpp => /dev/null                               | 4176 ---------------------------------------------
 build-old/aten/src/ATen/LegacyTHFunctionsCUDA.h => /dev/null                                 |  111 --
 {build-old => build}/aten/src/ATen/MkldnnCPUType.cpp                                         |    0
 {build-old => build}/aten/src/ATen/NativeFunctions.h                                         |   20 +-
 {build-old => build}/aten/src/ATen/QuantizedCPUType.cpp                                      |    0
 {build-old => build}/aten/src/ATen/QuantizedCUDAType.cpp                                     |    0
 {build-old => build}/aten/src/ATen/QuantizedCUDAType.h                                       |    0
 {build-old => build}/aten/src/ATen/SparseCPUType.cpp                                         |    0
 {build-old => build}/aten/src/ATen/SparseCUDAType.cpp                                        |    0
 {build-old => build}/aten/src/ATen/SparseCUDAType.h                                          |    0
 {build-old => build}/aten/src/ATen/TypeDefault.cpp                                           |    0
 {build-old => build}/aten/src/ATen/core/ATenOpList.cpp                                       |    0
```

The only diff is this:

```
 diff --git a/build-old/aten/src/ATen/NativeFunctions.h b/build-new/aten/src/ATen/NativeFunctions.h
index a0463dc80d..3808d27824 100644
--- a/build-old/aten/src/ATen/NativeFunctions.h
+++ b/build-new/aten/src/ATen/NativeFunctions.h
@@ -116,15 +116,15 @@ CAFFE2_API Tensor avg_pool1d(const Tensor & self, IntArrayRef kernel_size, IntAr
 CAFFE2_API Tensor adaptive_avg_pool1d(const Tensor & self, IntArrayRef output_size);
 CAFFE2_API std::tuple<Tensor,Tensor> adaptive_max_pool1d(const Tensor & self, IntArrayRef output_size);
 CAFFE2_API Tensor add(const Tensor & self, const Tensor & other, Scalar alpha=1);
-CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor add_sparse(const Tensor & self, const Tensor & other, Scalar alpha=1);
+CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_(Tensor & self, const Tensor & other, Scalar alpha=1);
-CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_sparse_(Tensor & self, const Tensor & other, Scalar alpha=1);
+CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
-CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
+CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor add_relu(const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_relu_(Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_relu_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
@@ -639,15 +639,15 @@ CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indi
 CAFFE2_API std::tuple<Tensor,Tensor> mode(const Tensor & self, Dimname dim, bool keepdim=false);
 CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indices, const Tensor & self, Dimname dim, bool keepdim=false);
 CAFFE2_API Tensor mul(const Tensor & self, const Tensor & other);
-CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor mul_sparse(const Tensor & self, const Tensor & other);
+CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_(Tensor & self, const Tensor & other);
-CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_sparse_(Tensor & self, const Tensor & other);
+CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_out(Tensor & out, const Tensor & self, const Tensor & other);
-CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other);
+CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor mul(const Tensor & self, Scalar other);
 CAFFE2_API Tensor & mul_(Tensor & self, Scalar other);
 CAFFE2_API Tensor mv(const Tensor & self, const Tensor & vec);
@@ -793,8 +793,8 @@ CAFFE2_API Tensor & silu_(Tensor & self);
 CAFFE2_API Tensor & silu_out(Tensor & out, const Tensor & self);
 CAFFE2_API Tensor silu_backward(const Tensor & grad_output, const Tensor & self);
 CAFFE2_API Tensor sigmoid(const Tensor & self);
-CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self);
 CAFFE2_API Tensor sigmoid_quantized_cpu(const Tensor & self);
+CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self);
 CAFFE2_API Tensor & sigmoid_(Tensor & self);
 CAFFE2_API Tensor & mkldnn_sigmoid_(Tensor & self);
 CAFFE2_API Tensor & sigmoid_out(Tensor & out, const Tensor & self);
@@ -1008,17 +1008,17 @@ CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, bool kee
 CAFFE2_API Tensor nuclear_norm(const Tensor & self, IntArrayRef dim, bool keepdim=false);
 CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, IntArrayRef dim, bool keepdim=false);
 CAFFE2_API Tensor clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
+CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
 CAFFE2_API Tensor mkldnn_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
 CAFFE2_API Tensor quantized_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
-CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
 CAFFE2_API Tensor & resize_as_(Tensor & self, const Tensor & the_template, c10::optional<MemoryFormat> memory_format=c10::nullopt);
 CAFFE2_API Tensor & pow_out(Tensor & out, const Tensor & self, Scalar exponent);
 CAFFE2_API Tensor & pow_out_sparse_scalar(Tensor & out, const Tensor & self, Scalar exponent);
 CAFFE2_API Tensor pow(const Tensor & self, Scalar exponent);
 CAFFE2_API Tensor pow_sparse_scalar(const Tensor & self, Scalar exponent);
 CAFFE2_API Tensor & zero_(Tensor & self);
-CAFFE2_API Tensor & mkldnn_zero_(Tensor & self);
 CAFFE2_API Tensor & zero_sparse_(Tensor & self);
+CAFFE2_API Tensor & mkldnn_zero_(Tensor & self);
 CAFFE2_API Tensor & sub_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & sub_out_sparse(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor sub(const Tensor & self, const Tensor & other, Scalar alpha=1);
@@ -1053,8 +1053,8 @@ CAFFE2_API Tensor & sparse_resize_(Tensor & self, IntArrayRef size, int64_t spar
 CAFFE2_API Tensor & sparse_resize_and_clear_(Tensor & self, IntArrayRef size, int64_t sparse_dim, int64_t dense_dim);
 CAFFE2_API Tensor sparse_mask_cpu(const Tensor & self, const Tensor & mask);
 CAFFE2_API Tensor sparse_mask_cuda(const Tensor & self, const Tensor & mask);
-CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self);
 CAFFE2_API Tensor sparse_to_dense(const Tensor & self);
+CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self);
 CAFFE2_API Tensor to_dense_backward(const Tensor & grad, const Tensor & input);
 CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self);
 CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self);
```

These are just wobbles in the order of the declarations; I couldn't be bothered to figure out exactly how the old codegen did the ordering.

Signed-off-by: Edward Z. Yang <[email protected]>

Differential Revision: [D23183978](https://our.internmc.facebook.com/intern/diff/D23183978)

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Aug 28, 2020
How to approach reviewing this diff:

- The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen.
- The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`.
- All of the inputs to the old codegen are deleted.
- Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI.
- LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port.

Signed-off-by: Edward Z. Yang <[email protected]>

ghstack-source-id: 5203f12d2cd98c740e57a5c479e0014121351c65
Pull Request resolved: #42629
How to approach reviewing this diff:

- The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen.
- The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`.
- All of the inputs to the old codegen are deleted.
- Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI.
- LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port. I kept the CUDA header in ATen/ to avoid having to fix a bunch of headers.

This diff cannot be currently landed as it doesn't reimplement static dispatch.

How do we know that this diff is right? We aimed for byte-for-byte modulo whitespace compatibility with the old generated code. Apply the following patch (to remove static dispatch) to the base version of PyTorch:

```
diff --git a/aten/src/ATen/function_wrapper.py b/aten/src/ATen/function_wrapper.py
index e26bb3941b..334475212b 100644
--- a/aten/src/ATen/function_wrapper.py
+++ b/aten/src/ATen/function_wrapper.py
@@ -147,7 +147,6 @@ TENSOR_METHOD_DEFINITION = CodeTemplate("""\
 // ${schema_string}
 ${return_type} Tensor::${api_name}(${method_formals}) const {
 #ifdef USE_STATIC_DISPATCH
-    ${static_dispatch_method_body}
 #else
     static auto op = c10::Dispatcher::singleton()
         .findSchemaOrThrow("aten::${operator_name}", "${overload_name}")
@@ -173,7 +172,6 @@ FUNCTION_DEFINITION = CodeTemplate("""\
 // ${schema_string}
 ${return_type} ${api_name}(${formals}) {
 #ifdef USE_STATIC_DISPATCH
-    ${static_dispatch_function_body}
 #else
     static auto op = c10::Dispatcher::singleton()
         .findSchemaOrThrow("aten::${operator_name}", "${overload_name}")
```

and then we generate the old and new versions and diff them:

```
 {build-old => build}/aten/src/ATen/BackendSelectRegister.cpp                                 |    0
 {build-old => build}/aten/src/ATen/CPUType.cpp                                               |    0
 {build-old => build}/aten/src/ATen/CUDAType.cpp                                              |    0
 {build-old => build}/aten/src/ATen/CUDAType.h                                                |    0
 build-old/aten/src/ATen/LegacyTHFunctionsCPU.cpp => /dev/null                                | 1712 -------------------
 build-old/aten/src/ATen/LegacyTHFunctionsCPU.h => /dev/null                                  |   67 -
 build-old/aten/src/ATen/LegacyTHFunctionsCUDA.cpp => /dev/null                               | 4176 ---------------------------------------------
 build-old/aten/src/ATen/LegacyTHFunctionsCUDA.h => /dev/null                                 |  111 --
 {build-old => build}/aten/src/ATen/MkldnnCPUType.cpp                                         |    0
 {build-old => build}/aten/src/ATen/NativeFunctions.h                                         |   20 +-
 {build-old => build}/aten/src/ATen/QuantizedCPUType.cpp                                      |    0
 {build-old => build}/aten/src/ATen/QuantizedCUDAType.cpp                                     |    0
 {build-old => build}/aten/src/ATen/QuantizedCUDAType.h                                       |    0
 {build-old => build}/aten/src/ATen/SparseCPUType.cpp                                         |    0
 {build-old => build}/aten/src/ATen/SparseCUDAType.cpp                                        |    0
 {build-old => build}/aten/src/ATen/SparseCUDAType.h                                          |    0
 {build-old => build}/aten/src/ATen/TypeDefault.cpp                                           |    0
 {build-old => build}/aten/src/ATen/core/ATenOpList.cpp                                       |    0
```

The only diff is this:

```
 diff --git a/build-old/aten/src/ATen/NativeFunctions.h b/build-new/aten/src/ATen/NativeFunctions.h
index a0463dc80d..3808d27824 100644
--- a/build-old/aten/src/ATen/NativeFunctions.h
+++ b/build-new/aten/src/ATen/NativeFunctions.h
@@ -116,15 +116,15 @@ CAFFE2_API Tensor avg_pool1d(const Tensor & self, IntArrayRef kernel_size, IntAr
 CAFFE2_API Tensor adaptive_avg_pool1d(const Tensor & self, IntArrayRef output_size);
 CAFFE2_API std::tuple<Tensor,Tensor> adaptive_max_pool1d(const Tensor & self, IntArrayRef output_size);
 CAFFE2_API Tensor add(const Tensor & self, const Tensor & other, Scalar alpha=1);
-CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor add_sparse(const Tensor & self, const Tensor & other, Scalar alpha=1);
+CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_(Tensor & self, const Tensor & other, Scalar alpha=1);
-CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_sparse_(Tensor & self, const Tensor & other, Scalar alpha=1);
+CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
-CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
+CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor add_relu(const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_relu_(Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_relu_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
@@ -639,15 +639,15 @@ CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indi
 CAFFE2_API std::tuple<Tensor,Tensor> mode(const Tensor & self, Dimname dim, bool keepdim=false);
 CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indices, const Tensor & self, Dimname dim, bool keepdim=false);
 CAFFE2_API Tensor mul(const Tensor & self, const Tensor & other);
-CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor mul_sparse(const Tensor & self, const Tensor & other);
+CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_(Tensor & self, const Tensor & other);
-CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_sparse_(Tensor & self, const Tensor & other);
+CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_out(Tensor & out, const Tensor & self, const Tensor & other);
-CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other);
+CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor mul(const Tensor & self, Scalar other);
 CAFFE2_API Tensor & mul_(Tensor & self, Scalar other);
 CAFFE2_API Tensor mv(const Tensor & self, const Tensor & vec);
@@ -793,8 +793,8 @@ CAFFE2_API Tensor & silu_(Tensor & self);
 CAFFE2_API Tensor & silu_out(Tensor & out, const Tensor & self);
 CAFFE2_API Tensor silu_backward(const Tensor & grad_output, const Tensor & self);
 CAFFE2_API Tensor sigmoid(const Tensor & self);
-CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self);
 CAFFE2_API Tensor sigmoid_quantized_cpu(const Tensor & self);
+CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self);
 CAFFE2_API Tensor & sigmoid_(Tensor & self);
 CAFFE2_API Tensor & mkldnn_sigmoid_(Tensor & self);
 CAFFE2_API Tensor & sigmoid_out(Tensor & out, const Tensor & self);
@@ -1008,17 +1008,17 @@ CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, bool kee
 CAFFE2_API Tensor nuclear_norm(const Tensor & self, IntArrayRef dim, bool keepdim=false);
 CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, IntArrayRef dim, bool keepdim=false);
 CAFFE2_API Tensor clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
+CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
 CAFFE2_API Tensor mkldnn_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
 CAFFE2_API Tensor quantized_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
-CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
 CAFFE2_API Tensor & resize_as_(Tensor & self, const Tensor & the_template, c10::optional<MemoryFormat> memory_format=c10::nullopt);
 CAFFE2_API Tensor & pow_out(Tensor & out, const Tensor & self, Scalar exponent);
 CAFFE2_API Tensor & pow_out_sparse_scalar(Tensor & out, const Tensor & self, Scalar exponent);
 CAFFE2_API Tensor pow(const Tensor & self, Scalar exponent);
 CAFFE2_API Tensor pow_sparse_scalar(const Tensor & self, Scalar exponent);
 CAFFE2_API Tensor & zero_(Tensor & self);
-CAFFE2_API Tensor & mkldnn_zero_(Tensor & self);
 CAFFE2_API Tensor & zero_sparse_(Tensor & self);
+CAFFE2_API Tensor & mkldnn_zero_(Tensor & self);
 CAFFE2_API Tensor & sub_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & sub_out_sparse(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor sub(const Tensor & self, const Tensor & other, Scalar alpha=1);
@@ -1053,8 +1053,8 @@ CAFFE2_API Tensor & sparse_resize_(Tensor & self, IntArrayRef size, int64_t spar
 CAFFE2_API Tensor & sparse_resize_and_clear_(Tensor & self, IntArrayRef size, int64_t sparse_dim, int64_t dense_dim);
 CAFFE2_API Tensor sparse_mask_cpu(const Tensor & self, const Tensor & mask);
 CAFFE2_API Tensor sparse_mask_cuda(const Tensor & self, const Tensor & mask);
-CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self);
 CAFFE2_API Tensor sparse_to_dense(const Tensor & self);
+CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self);
 CAFFE2_API Tensor to_dense_backward(const Tensor & grad, const Tensor & input);
 CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self);
 CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self);
```

These are just wobbles in the order of the declarations; I couldn't be bothered to figure out exactly how the old codegen did the ordering.

Signed-off-by: Edward Z. Yang <[email protected]>

Differential Revision: [D23183978](https://our.internmc.facebook.com/intern/diff/D23183978)

[ghstack-poisoned]
How to approach reviewing this diff:

- The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen.
- The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`.
- All of the inputs to the old codegen are deleted.
- Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI.
- LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port. I kept the CUDA header in ATen/ to avoid having to fix a bunch of headers.

This diff cannot be currently landed as it doesn't reimplement static dispatch.

How do we know that this diff is right? We aimed for byte-for-byte modulo whitespace compatibility with the old generated code. Apply the following patch (to remove static dispatch) to the base version of PyTorch:

```
diff --git a/aten/src/ATen/function_wrapper.py b/aten/src/ATen/function_wrapper.py
index e26bb3941b..334475212b 100644
--- a/aten/src/ATen/function_wrapper.py
+++ b/aten/src/ATen/function_wrapper.py
@@ -147,7 +147,6 @@ TENSOR_METHOD_DEFINITION = CodeTemplate("""\
 // ${schema_string}
 ${return_type} Tensor::${api_name}(${method_formals}) const {
 #ifdef USE_STATIC_DISPATCH
-    ${static_dispatch_method_body}
 #else
     static auto op = c10::Dispatcher::singleton()
         .findSchemaOrThrow("aten::${operator_name}", "${overload_name}")
@@ -173,7 +172,6 @@ FUNCTION_DEFINITION = CodeTemplate("""\
 // ${schema_string}
 ${return_type} ${api_name}(${formals}) {
 #ifdef USE_STATIC_DISPATCH
-    ${static_dispatch_function_body}
 #else
     static auto op = c10::Dispatcher::singleton()
         .findSchemaOrThrow("aten::${operator_name}", "${overload_name}")
```

and then we generate the old and new versions and diff them:

```
 {build-old => build}/aten/src/ATen/BackendSelectRegister.cpp                                 |    0
 {build-old => build}/aten/src/ATen/CPUType.cpp                                               |    0
 {build-old => build}/aten/src/ATen/CUDAType.cpp                                              |    0
 {build-old => build}/aten/src/ATen/CUDAType.h                                                |    0
 build-old/aten/src/ATen/LegacyTHFunctionsCPU.cpp => /dev/null                                | 1712 -------------------
 build-old/aten/src/ATen/LegacyTHFunctionsCPU.h => /dev/null                                  |   67 -
 build-old/aten/src/ATen/LegacyTHFunctionsCUDA.cpp => /dev/null                               | 4176 ---------------------------------------------
 build-old/aten/src/ATen/LegacyTHFunctionsCUDA.h => /dev/null                                 |  111 --
 {build-old => build}/aten/src/ATen/MkldnnCPUType.cpp                                         |    0
 {build-old => build}/aten/src/ATen/NativeFunctions.h                                         |   20 +-
 {build-old => build}/aten/src/ATen/QuantizedCPUType.cpp                                      |    0
 {build-old => build}/aten/src/ATen/QuantizedCUDAType.cpp                                     |    0
 {build-old => build}/aten/src/ATen/QuantizedCUDAType.h                                       |    0
 {build-old => build}/aten/src/ATen/SparseCPUType.cpp                                         |    0
 {build-old => build}/aten/src/ATen/SparseCUDAType.cpp                                        |    0
 {build-old => build}/aten/src/ATen/SparseCUDAType.h                                          |    0
 {build-old => build}/aten/src/ATen/TypeDefault.cpp                                           |    0
 {build-old => build}/aten/src/ATen/core/ATenOpList.cpp                                       |    0
```

The only diff is this:

```
 diff --git a/build-old/aten/src/ATen/NativeFunctions.h b/build-new/aten/src/ATen/NativeFunctions.h
index a0463dc80d..3808d27824 100644
--- a/build-old/aten/src/ATen/NativeFunctions.h
+++ b/build-new/aten/src/ATen/NativeFunctions.h
@@ -116,15 +116,15 @@ CAFFE2_API Tensor avg_pool1d(const Tensor & self, IntArrayRef kernel_size, IntAr
 CAFFE2_API Tensor adaptive_avg_pool1d(const Tensor & self, IntArrayRef output_size);
 CAFFE2_API std::tuple<Tensor,Tensor> adaptive_max_pool1d(const Tensor & self, IntArrayRef output_size);
 CAFFE2_API Tensor add(const Tensor & self, const Tensor & other, Scalar alpha=1);
-CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor add_sparse(const Tensor & self, const Tensor & other, Scalar alpha=1);
+CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_(Tensor & self, const Tensor & other, Scalar alpha=1);
-CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_sparse_(Tensor & self, const Tensor & other, Scalar alpha=1);
+CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
-CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
+CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor add_relu(const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_relu_(Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_relu_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
@@ -639,15 +639,15 @@ CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indi
 CAFFE2_API std::tuple<Tensor,Tensor> mode(const Tensor & self, Dimname dim, bool keepdim=false);
 CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indices, const Tensor & self, Dimname dim, bool keepdim=false);
 CAFFE2_API Tensor mul(const Tensor & self, const Tensor & other);
-CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor mul_sparse(const Tensor & self, const Tensor & other);
+CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_(Tensor & self, const Tensor & other);
-CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_sparse_(Tensor & self, const Tensor & other);
+CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_out(Tensor & out, const Tensor & self, const Tensor & other);
-CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other);
+CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor mul(const Tensor & self, Scalar other);
 CAFFE2_API Tensor & mul_(Tensor & self, Scalar other);
 CAFFE2_API Tensor mv(const Tensor & self, const Tensor & vec);
@@ -793,8 +793,8 @@ CAFFE2_API Tensor & silu_(Tensor & self);
 CAFFE2_API Tensor & silu_out(Tensor & out, const Tensor & self);
 CAFFE2_API Tensor silu_backward(const Tensor & grad_output, const Tensor & self);
 CAFFE2_API Tensor sigmoid(const Tensor & self);
-CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self);
 CAFFE2_API Tensor sigmoid_quantized_cpu(const Tensor & self);
+CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self);
 CAFFE2_API Tensor & sigmoid_(Tensor & self);
 CAFFE2_API Tensor & mkldnn_sigmoid_(Tensor & self);
 CAFFE2_API Tensor & sigmoid_out(Tensor & out, const Tensor & self);
@@ -1008,17 +1008,17 @@ CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, bool kee
 CAFFE2_API Tensor nuclear_norm(const Tensor & self, IntArrayRef dim, bool keepdim=false);
 CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, IntArrayRef dim, bool keepdim=false);
 CAFFE2_API Tensor clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
+CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
 CAFFE2_API Tensor mkldnn_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
 CAFFE2_API Tensor quantized_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
-CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
 CAFFE2_API Tensor & resize_as_(Tensor & self, const Tensor & the_template, c10::optional<MemoryFormat> memory_format=c10::nullopt);
 CAFFE2_API Tensor & pow_out(Tensor & out, const Tensor & self, Scalar exponent);
 CAFFE2_API Tensor & pow_out_sparse_scalar(Tensor & out, const Tensor & self, Scalar exponent);
 CAFFE2_API Tensor pow(const Tensor & self, Scalar exponent);
 CAFFE2_API Tensor pow_sparse_scalar(const Tensor & self, Scalar exponent);
 CAFFE2_API Tensor & zero_(Tensor & self);
-CAFFE2_API Tensor & mkldnn_zero_(Tensor & self);
 CAFFE2_API Tensor & zero_sparse_(Tensor & self);
+CAFFE2_API Tensor & mkldnn_zero_(Tensor & self);
 CAFFE2_API Tensor & sub_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & sub_out_sparse(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor sub(const Tensor & self, const Tensor & other, Scalar alpha=1);
@@ -1053,8 +1053,8 @@ CAFFE2_API Tensor & sparse_resize_(Tensor & self, IntArrayRef size, int64_t spar
 CAFFE2_API Tensor & sparse_resize_and_clear_(Tensor & self, IntArrayRef size, int64_t sparse_dim, int64_t dense_dim);
 CAFFE2_API Tensor sparse_mask_cpu(const Tensor & self, const Tensor & mask);
 CAFFE2_API Tensor sparse_mask_cuda(const Tensor & self, const Tensor & mask);
-CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self);
 CAFFE2_API Tensor sparse_to_dense(const Tensor & self);
+CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self);
 CAFFE2_API Tensor to_dense_backward(const Tensor & grad, const Tensor & input);
 CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self);
 CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self);
```

These are just wobbles in the order of the declarations; I couldn't be bothered to figure out exactly how the old codegen did the ordering.

Signed-off-by: Edward Z. Yang <[email protected]>

Differential Revision: [D23183978](https://our.internmc.facebook.com/intern/diff/D23183978)

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Aug 30, 2020
How to approach reviewing this diff:

- The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen.
- The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`.
- All of the inputs to the old codegen are deleted.
- Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI.
- LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port.

Signed-off-by: Edward Z. Yang <[email protected]>

ghstack-source-id: 85e52d4957c407adbd2b48bfe78ad00ce152c701
Pull Request resolved: #42629
How to approach reviewing this diff:

- The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen.
- The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`.
- All of the inputs to the old codegen are deleted.
- Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI.
- LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port. I kept the CUDA header in ATen/ to avoid having to fix a bunch of headers.

This diff cannot be currently landed as it doesn't reimplement static dispatch.

How do we know that this diff is right? We aimed for byte-for-byte modulo whitespace compatibility with the old generated code. Apply the following patch (to remove static dispatch) to the base version of PyTorch:

```
diff --git a/aten/src/ATen/function_wrapper.py b/aten/src/ATen/function_wrapper.py
index e26bb3941b..334475212b 100644
--- a/aten/src/ATen/function_wrapper.py
+++ b/aten/src/ATen/function_wrapper.py
@@ -147,7 +147,6 @@ TENSOR_METHOD_DEFINITION = CodeTemplate("""\
 // ${schema_string}
 ${return_type} Tensor::${api_name}(${method_formals}) const {
 #ifdef USE_STATIC_DISPATCH
-    ${static_dispatch_method_body}
 #else
     static auto op = c10::Dispatcher::singleton()
         .findSchemaOrThrow("aten::${operator_name}", "${overload_name}")
@@ -173,7 +172,6 @@ FUNCTION_DEFINITION = CodeTemplate("""\
 // ${schema_string}
 ${return_type} ${api_name}(${formals}) {
 #ifdef USE_STATIC_DISPATCH
-    ${static_dispatch_function_body}
 #else
     static auto op = c10::Dispatcher::singleton()
         .findSchemaOrThrow("aten::${operator_name}", "${overload_name}")
```

and then we generate the old and new versions and diff them:

```
 {build-old => build}/aten/src/ATen/BackendSelectRegister.cpp                                 |    0
 {build-old => build}/aten/src/ATen/CPUType.cpp                                               |    0
 {build-old => build}/aten/src/ATen/CUDAType.cpp                                              |    0
 {build-old => build}/aten/src/ATen/CUDAType.h                                                |    0
 build-old/aten/src/ATen/LegacyTHFunctionsCPU.cpp => /dev/null                                | 1712 -------------------
 build-old/aten/src/ATen/LegacyTHFunctionsCPU.h => /dev/null                                  |   67 -
 build-old/aten/src/ATen/LegacyTHFunctionsCUDA.cpp => /dev/null                               | 4176 ---------------------------------------------
 build-old/aten/src/ATen/LegacyTHFunctionsCUDA.h => /dev/null                                 |  111 --
 {build-old => build}/aten/src/ATen/MkldnnCPUType.cpp                                         |    0
 {build-old => build}/aten/src/ATen/NativeFunctions.h                                         |   20 +-
 {build-old => build}/aten/src/ATen/QuantizedCPUType.cpp                                      |    0
 {build-old => build}/aten/src/ATen/QuantizedCUDAType.cpp                                     |    0
 {build-old => build}/aten/src/ATen/QuantizedCUDAType.h                                       |    0
 {build-old => build}/aten/src/ATen/SparseCPUType.cpp                                         |    0
 {build-old => build}/aten/src/ATen/SparseCUDAType.cpp                                        |    0
 {build-old => build}/aten/src/ATen/SparseCUDAType.h                                          |    0
 {build-old => build}/aten/src/ATen/TypeDefault.cpp                                           |    0
 {build-old => build}/aten/src/ATen/core/ATenOpList.cpp                                       |    0
```

The only diff is this:

```
 diff --git a/build-old/aten/src/ATen/NativeFunctions.h b/build-new/aten/src/ATen/NativeFunctions.h
index a0463dc80d..3808d27824 100644
--- a/build-old/aten/src/ATen/NativeFunctions.h
+++ b/build-new/aten/src/ATen/NativeFunctions.h
@@ -116,15 +116,15 @@ CAFFE2_API Tensor avg_pool1d(const Tensor & self, IntArrayRef kernel_size, IntAr
 CAFFE2_API Tensor adaptive_avg_pool1d(const Tensor & self, IntArrayRef output_size);
 CAFFE2_API std::tuple<Tensor,Tensor> adaptive_max_pool1d(const Tensor & self, IntArrayRef output_size);
 CAFFE2_API Tensor add(const Tensor & self, const Tensor & other, Scalar alpha=1);
-CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor add_sparse(const Tensor & self, const Tensor & other, Scalar alpha=1);
+CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_(Tensor & self, const Tensor & other, Scalar alpha=1);
-CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_sparse_(Tensor & self, const Tensor & other, Scalar alpha=1);
+CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
-CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
+CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor add_relu(const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_relu_(Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_relu_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
@@ -639,15 +639,15 @@ CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indi
 CAFFE2_API std::tuple<Tensor,Tensor> mode(const Tensor & self, Dimname dim, bool keepdim=false);
 CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indices, const Tensor & self, Dimname dim, bool keepdim=false);
 CAFFE2_API Tensor mul(const Tensor & self, const Tensor & other);
-CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor mul_sparse(const Tensor & self, const Tensor & other);
+CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_(Tensor & self, const Tensor & other);
-CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_sparse_(Tensor & self, const Tensor & other);
+CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_out(Tensor & out, const Tensor & self, const Tensor & other);
-CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other);
+CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor mul(const Tensor & self, Scalar other);
 CAFFE2_API Tensor & mul_(Tensor & self, Scalar other);
 CAFFE2_API Tensor mv(const Tensor & self, const Tensor & vec);
@@ -793,8 +793,8 @@ CAFFE2_API Tensor & silu_(Tensor & self);
 CAFFE2_API Tensor & silu_out(Tensor & out, const Tensor & self);
 CAFFE2_API Tensor silu_backward(const Tensor & grad_output, const Tensor & self);
 CAFFE2_API Tensor sigmoid(const Tensor & self);
-CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self);
 CAFFE2_API Tensor sigmoid_quantized_cpu(const Tensor & self);
+CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self);
 CAFFE2_API Tensor & sigmoid_(Tensor & self);
 CAFFE2_API Tensor & mkldnn_sigmoid_(Tensor & self);
 CAFFE2_API Tensor & sigmoid_out(Tensor & out, const Tensor & self);
@@ -1008,17 +1008,17 @@ CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, bool kee
 CAFFE2_API Tensor nuclear_norm(const Tensor & self, IntArrayRef dim, bool keepdim=false);
 CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, IntArrayRef dim, bool keepdim=false);
 CAFFE2_API Tensor clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
+CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
 CAFFE2_API Tensor mkldnn_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
 CAFFE2_API Tensor quantized_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
-CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
 CAFFE2_API Tensor & resize_as_(Tensor & self, const Tensor & the_template, c10::optional<MemoryFormat> memory_format=c10::nullopt);
 CAFFE2_API Tensor & pow_out(Tensor & out, const Tensor & self, Scalar exponent);
 CAFFE2_API Tensor & pow_out_sparse_scalar(Tensor & out, const Tensor & self, Scalar exponent);
 CAFFE2_API Tensor pow(const Tensor & self, Scalar exponent);
 CAFFE2_API Tensor pow_sparse_scalar(const Tensor & self, Scalar exponent);
 CAFFE2_API Tensor & zero_(Tensor & self);
-CAFFE2_API Tensor & mkldnn_zero_(Tensor & self);
 CAFFE2_API Tensor & zero_sparse_(Tensor & self);
+CAFFE2_API Tensor & mkldnn_zero_(Tensor & self);
 CAFFE2_API Tensor & sub_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & sub_out_sparse(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor sub(const Tensor & self, const Tensor & other, Scalar alpha=1);
@@ -1053,8 +1053,8 @@ CAFFE2_API Tensor & sparse_resize_(Tensor & self, IntArrayRef size, int64_t spar
 CAFFE2_API Tensor & sparse_resize_and_clear_(Tensor & self, IntArrayRef size, int64_t sparse_dim, int64_t dense_dim);
 CAFFE2_API Tensor sparse_mask_cpu(const Tensor & self, const Tensor & mask);
 CAFFE2_API Tensor sparse_mask_cuda(const Tensor & self, const Tensor & mask);
-CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self);
 CAFFE2_API Tensor sparse_to_dense(const Tensor & self);
+CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self);
 CAFFE2_API Tensor to_dense_backward(const Tensor & grad, const Tensor & input);
 CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self);
 CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self);
```

These are just wobbles in the order of the declarations; I couldn't be bothered to figure out exactly how the old codegen did the ordering.

Signed-off-by: Edward Z. Yang <[email protected]>

Differential Revision: [D23183978](https://our.internmc.facebook.com/intern/diff/D23183978)

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Aug 30, 2020
How to approach reviewing this diff:

- The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen.
- The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`.
- All of the inputs to the old codegen are deleted.
- Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI.
- LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port.

Signed-off-by: Edward Z. Yang <[email protected]>

ghstack-source-id: d3efb3e8bee0f07cb26d6fe2f71144d7ef488af8
Pull Request resolved: #42629
@codecov
Copy link

codecov bot commented Aug 30, 2020

Codecov Report

❗ No coverage uploaded for pull request base (gh/ezyang/819/base@3c1714c). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@                  Coverage Diff                  @@
##             gh/ezyang/819/base   #42629   +/-   ##
=====================================================
  Coverage                      ?   69.32%           
=====================================================
  Files                         ?      378           
  Lines                         ?    46761           
  Branches                      ?        0           
=====================================================
  Hits                          ?    32417           
  Misses                        ?    14344           
  Partials                      ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3c1714c...01afc1b. Read the comment docs.

How to approach reviewing this diff:

- The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen.
- The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`.
- All of the inputs to the old codegen are deleted.
- Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI.
- LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port. I kept the CUDA header in ATen/ to avoid having to fix a bunch of headers.

This diff cannot be currently landed as it doesn't reimplement static dispatch.

How do we know that this diff is right? We aimed for byte-for-byte modulo whitespace compatibility with the old generated code. Apply the following patch (to remove static dispatch) to the base version of PyTorch:

```
diff --git a/aten/src/ATen/function_wrapper.py b/aten/src/ATen/function_wrapper.py
index e26bb3941b..334475212b 100644
--- a/aten/src/ATen/function_wrapper.py
+++ b/aten/src/ATen/function_wrapper.py
@@ -147,7 +147,6 @@ TENSOR_METHOD_DEFINITION = CodeTemplate("""\
 // ${schema_string}
 ${return_type} Tensor::${api_name}(${method_formals}) const {
 #ifdef USE_STATIC_DISPATCH
-    ${static_dispatch_method_body}
 #else
     static auto op = c10::Dispatcher::singleton()
         .findSchemaOrThrow("aten::${operator_name}", "${overload_name}")
@@ -173,7 +172,6 @@ FUNCTION_DEFINITION = CodeTemplate("""\
 // ${schema_string}
 ${return_type} ${api_name}(${formals}) {
 #ifdef USE_STATIC_DISPATCH
-    ${static_dispatch_function_body}
 #else
     static auto op = c10::Dispatcher::singleton()
         .findSchemaOrThrow("aten::${operator_name}", "${overload_name}")
```

and then we generate the old and new versions and diff them:

```
 {build-old => build}/aten/src/ATen/BackendSelectRegister.cpp                                 |    0
 {build-old => build}/aten/src/ATen/CPUType.cpp                                               |    0
 {build-old => build}/aten/src/ATen/CUDAType.cpp                                              |    0
 {build-old => build}/aten/src/ATen/CUDAType.h                                                |    0
 build-old/aten/src/ATen/LegacyTHFunctionsCPU.cpp => /dev/null                                | 1712 -------------------
 build-old/aten/src/ATen/LegacyTHFunctionsCPU.h => /dev/null                                  |   67 -
 build-old/aten/src/ATen/LegacyTHFunctionsCUDA.cpp => /dev/null                               | 4176 ---------------------------------------------
 build-old/aten/src/ATen/LegacyTHFunctionsCUDA.h => /dev/null                                 |  111 --
 {build-old => build}/aten/src/ATen/MkldnnCPUType.cpp                                         |    0
 {build-old => build}/aten/src/ATen/NativeFunctions.h                                         |   20 +-
 {build-old => build}/aten/src/ATen/QuantizedCPUType.cpp                                      |    0
 {build-old => build}/aten/src/ATen/QuantizedCUDAType.cpp                                     |    0
 {build-old => build}/aten/src/ATen/QuantizedCUDAType.h                                       |    0
 {build-old => build}/aten/src/ATen/SparseCPUType.cpp                                         |    0
 {build-old => build}/aten/src/ATen/SparseCUDAType.cpp                                        |    0
 {build-old => build}/aten/src/ATen/SparseCUDAType.h                                          |    0
 {build-old => build}/aten/src/ATen/TypeDefault.cpp                                           |    0
 {build-old => build}/aten/src/ATen/core/ATenOpList.cpp                                       |    0
```

The only diff is this:

```
 diff --git a/build-old/aten/src/ATen/NativeFunctions.h b/build-new/aten/src/ATen/NativeFunctions.h
index a0463dc80d..3808d27824 100644
--- a/build-old/aten/src/ATen/NativeFunctions.h
+++ b/build-new/aten/src/ATen/NativeFunctions.h
@@ -116,15 +116,15 @@ CAFFE2_API Tensor avg_pool1d(const Tensor & self, IntArrayRef kernel_size, IntAr
 CAFFE2_API Tensor adaptive_avg_pool1d(const Tensor & self, IntArrayRef output_size);
 CAFFE2_API std::tuple<Tensor,Tensor> adaptive_max_pool1d(const Tensor & self, IntArrayRef output_size);
 CAFFE2_API Tensor add(const Tensor & self, const Tensor & other, Scalar alpha=1);
-CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor add_sparse(const Tensor & self, const Tensor & other, Scalar alpha=1);
+CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_(Tensor & self, const Tensor & other, Scalar alpha=1);
-CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_sparse_(Tensor & self, const Tensor & other, Scalar alpha=1);
+CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
-CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
+CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor add_relu(const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_relu_(Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & add_relu_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
@@ -639,15 +639,15 @@ CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indi
 CAFFE2_API std::tuple<Tensor,Tensor> mode(const Tensor & self, Dimname dim, bool keepdim=false);
 CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indices, const Tensor & self, Dimname dim, bool keepdim=false);
 CAFFE2_API Tensor mul(const Tensor & self, const Tensor & other);
-CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor mul_sparse(const Tensor & self, const Tensor & other);
+CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_(Tensor & self, const Tensor & other);
-CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_sparse_(Tensor & self, const Tensor & other);
+CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_out(Tensor & out, const Tensor & self, const Tensor & other);
-CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor & mul_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other);
+CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other);
 CAFFE2_API Tensor mul(const Tensor & self, Scalar other);
 CAFFE2_API Tensor & mul_(Tensor & self, Scalar other);
 CAFFE2_API Tensor mv(const Tensor & self, const Tensor & vec);
@@ -793,8 +793,8 @@ CAFFE2_API Tensor & silu_(Tensor & self);
 CAFFE2_API Tensor & silu_out(Tensor & out, const Tensor & self);
 CAFFE2_API Tensor silu_backward(const Tensor & grad_output, const Tensor & self);
 CAFFE2_API Tensor sigmoid(const Tensor & self);
-CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self);
 CAFFE2_API Tensor sigmoid_quantized_cpu(const Tensor & self);
+CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self);
 CAFFE2_API Tensor & sigmoid_(Tensor & self);
 CAFFE2_API Tensor & mkldnn_sigmoid_(Tensor & self);
 CAFFE2_API Tensor & sigmoid_out(Tensor & out, const Tensor & self);
@@ -1008,17 +1008,17 @@ CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, bool kee
 CAFFE2_API Tensor nuclear_norm(const Tensor & self, IntArrayRef dim, bool keepdim=false);
 CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, IntArrayRef dim, bool keepdim=false);
 CAFFE2_API Tensor clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
+CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
 CAFFE2_API Tensor mkldnn_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
 CAFFE2_API Tensor quantized_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
-CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt);
 CAFFE2_API Tensor & resize_as_(Tensor & self, const Tensor & the_template, c10::optional<MemoryFormat> memory_format=c10::nullopt);
 CAFFE2_API Tensor & pow_out(Tensor & out, const Tensor & self, Scalar exponent);
 CAFFE2_API Tensor & pow_out_sparse_scalar(Tensor & out, const Tensor & self, Scalar exponent);
 CAFFE2_API Tensor pow(const Tensor & self, Scalar exponent);
 CAFFE2_API Tensor pow_sparse_scalar(const Tensor & self, Scalar exponent);
 CAFFE2_API Tensor & zero_(Tensor & self);
-CAFFE2_API Tensor & mkldnn_zero_(Tensor & self);
 CAFFE2_API Tensor & zero_sparse_(Tensor & self);
+CAFFE2_API Tensor & mkldnn_zero_(Tensor & self);
 CAFFE2_API Tensor & sub_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor & sub_out_sparse(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
 CAFFE2_API Tensor sub(const Tensor & self, const Tensor & other, Scalar alpha=1);
@@ -1053,8 +1053,8 @@ CAFFE2_API Tensor & sparse_resize_(Tensor & self, IntArrayRef size, int64_t spar
 CAFFE2_API Tensor & sparse_resize_and_clear_(Tensor & self, IntArrayRef size, int64_t sparse_dim, int64_t dense_dim);
 CAFFE2_API Tensor sparse_mask_cpu(const Tensor & self, const Tensor & mask);
 CAFFE2_API Tensor sparse_mask_cuda(const Tensor & self, const Tensor & mask);
-CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self);
 CAFFE2_API Tensor sparse_to_dense(const Tensor & self);
+CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self);
 CAFFE2_API Tensor to_dense_backward(const Tensor & grad, const Tensor & input);
 CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self);
 CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self);
```

These are just wobbles in the order of the declarations; I couldn't be bothered to figure out exactly how the old codegen did the ordering.

Signed-off-by: Edward Z. Yang <[email protected]>

Differential Revision: [D23183978](https://our.internmc.facebook.com/intern/diff/D23183978)

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Aug 31, 2020
How to approach reviewing this diff:

- The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen.
- The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`.
- All of the inputs to the old codegen are deleted.
- Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI.
- LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port.

Signed-off-by: Edward Z. Yang <[email protected]>

ghstack-source-id: 1caf086ea28fcd2a556896fb162e35578f52c336
Pull Request resolved: #42629
@facebook-github-bot
Copy link
Contributor

@ezyang merged this pull request in 6ea8916.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants