-
Notifications
You must be signed in to change notification settings - Fork 23.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite of ATen code generator #42629
Conversation
There is a single literate Python file. Start at the top and start reading from there. [ci skip] Signed-off-by: Edward Z. Yang <[email protected]> [ghstack-poisoned]
There is a single literate Python file. Start at the top and start reading from there. [ci skip] Signed-off-by: Edward Z. Yang <[email protected]> ghstack-source-id: 504b1581e235302c7d9ac172dd34adee4bc0c29e Pull Request resolved: #42629
There is a single literate Python file. Start at the top and start reading from there. This file currently produces a 100% compatible CPUType.h definition. You can tell by running: ``` python aten/src/ATen/new_gen.py | git diff --word-diff --no-index - build/aten/src/ATen/CPUType.h ``` [ci skip] Signed-off-by: Edward Z. Yang <[email protected]> [ghstack-poisoned]
There is a single literate Python file. Start at the top and start reading from there. This file currently produces a 100% compatible CPUType.h definition. You can tell by running: ``` python aten/src/ATen/new_gen.py | git diff --word-diff --no-index - build/aten/src/ATen/CPUType.h ``` [ci skip] Signed-off-by: Edward Z. Yang <[email protected]> [ghstack-poisoned]
There is a single literate Python file. Start at the top and start reading from there. This file currently produces a 100% compatible CPUType.h definition. You can tell by running: ``` python aten/src/ATen/new_gen.py | git diff --word-diff --no-index - build/aten/src/ATen/CPUType.h ``` [ci skip] Signed-off-by: Edward Z. Yang <[email protected]> ghstack-source-id: 33ef10f23cacbfa299304e18c8ff132910950a6c Pull Request resolved: #42629
💊 CI failures summary and remediationsAs of commit 01afc1b (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 144 times. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is much more structured and easier to understand than the old codegen. Thanks a lot for doing this.
aten/src/ATen/new_gen.py
Outdated
# You can see some of the overall design patterns for how we setup | ||
# dataclasses in this class, but we will defer a complete discussion | ||
# of this at FunctionSchema. | ||
@dataclass(frozen=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't give enough heart emojies for making this immutable
aten/src/ATen/new_gen.py
Outdated
# Corresponds to the 'use_c10_dispatcher' field. Historically, | ||
# this field could take several possible strings, but right | ||
# now you can have it in any color you like, as long as it's 'full' | ||
use_c10_dispatcher_full: bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Maybe make it an enum with two values instead? Mypy should make that type-safe and it would be closer to the native_functions.yaml representation.
aten/src/ATen/new_gen.py
Outdated
|
||
# Distinguish between a missing dispatch dict (historically, this | ||
# means to register a catch-all kernel) and a present but empty | ||
# dispatch dict (this means register nothing; arguably, this should |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh I didn't know empty dispatch dict was a thing. Yes, this should absolutely subsume manual_kernel_registrations after @ailzhang 's change landed and manual_kernel_registrations actually does what you describe above.
aten/src/ATen/new_gen.py
Outdated
func = FunctionSchema.parse(funcs) | ||
|
||
use_c10_dispatcher = e.get('use_c10_dispatcher') | ||
assert use_c10_dispatcher is None or use_c10_dispatcher == 'full', \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
technically, use_c10_dispatcher
had two possible values ['full', 'with_codegenerated_unboxing_wrapper']
and the second one was the default. I think there might be a phase where we can change the default to 'full' but still need to opt-out a few using 'with_codegenerated_unboxing_wrapper' by specifying that key manually in native_functions.yaml, so I'd keep that representation and not just make it a boolean.
aten/src/ATen/new_gen.py
Outdated
variants_s = e.get('variants', 'function') | ||
assert isinstance(variants_s, str) | ||
variants: Set[Variant] = set() | ||
for v in variants_s.split(', '): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Write a Variant.parse
to factor this out?
aten/src/ATen/new_gen.py
Outdated
is_write: bool | ||
|
||
@staticmethod | ||
def parse(ann: str) -> 'Annotation': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I remember reading something about mypy not requiring string-wrapping for forward declared types anymore...but I might be mistaken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mypy doesn't require it, but Python does D:
aten/src/ATen/new_gen.py
Outdated
return r | ||
|
||
@property | ||
def is_write(self) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_mutable
?
aten/src/ATen/new_gen.py
Outdated
def parse_arguments(args: str) -> Tuple[Sequence[Argument], Sequence[Argument], Sequence[Argument]]: | ||
""" | ||
Input: 'int x, int y, int z' | ||
Output: positional args, kwarg only args |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're missing docs for the third output
aten/src/ATen/new_gen.py
Outdated
out_arguments: List[Argument] = [] | ||
arguments_acc = arguments | ||
|
||
# TODO: Use a real parser here; this will get bamboozled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a lot of the code in here could benefit from a real parser. There seem to be a few libraries for python parser combinators, they would allow a concise syntax and might be useful.
aten/src/ATen/new_gen.py
Outdated
# TODO: TensorOptions argument detection | ||
# TODO: Extra enforcement of inplace functions having mutable self | ||
|
||
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is a prototype, but for the final version, I would propose splitting generation into a separate file from the data structure, probably a separate file for each generated artifact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, having multiple files will be good.
aten/src/ATen/new_gen.py
Outdated
# dispatch dict (this means register nothing; arguably, this should | ||
# subsume manual_kernel_registration). | ||
# | ||
# TODO: str key could be replaced with more explicit enum |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting one - it would be great to drive both codegen and c++ defs off the same source of truth, especially if we want to use the more complicated relationships we're contemplating, like aliases. (And unless we do unify the source of truth, a more semantic datatype will add change-tracking overhead in any case.)
aten/src/ATen/new_gen.py
Outdated
# TODO: str key could be replaced with more explicit enum | ||
dispatch: Optional[Dict[str, str]] | ||
|
||
# The location in the YAML file were this native function entry was |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: were
I was parsing subjunctive until I hit "defined" 😬
There is a single literate Python file. Start at the top and start reading from there. This file currently produces a 100% compatible CPUType.h definition. You can tell by running: ``` python aten/src/ATen/new_gen.py | git diff --word-diff --no-index - build/aten/src/ATen/CPUType.h ``` [ci skip] Signed-off-by: Edward Z. Yang <[email protected]> [ghstack-poisoned]
There is a single literate Python file. Start at the top and start reading from there. This file currently produces a 100% compatible CPUType.h definition. You can tell by running: ``` python aten/src/ATen/new_gen.py | git diff --word-diff --no-index - build/aten/src/ATen/CPUType.h ``` [ci skip] Signed-off-by: Edward Z. Yang <[email protected]> ghstack-source-id: 43b460f7e16482c4143fdf16b0324874fe816dc6 Pull Request resolved: #42629
aten/src/ATen/new_gen.py
Outdated
# I'm not really sure how to structure this logic yet, but here is a | ||
# sketch. This function is ONLY correct for CPUType.h at the moment; | ||
# I bet I am going to need another parameter before I'm done | ||
def cpp_type(t: Type, *, mutable: bool, argument: bool, legacy_optional: bool) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bhosmer suggests making this an lvalue versus rvalue distinction (instead of argument)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bhosmer On further reflection, I'm not sure lvalue v rvalue distinction really makes sense here. lvalue and rvalue refers to expressions, but the computation here involves types!
aten/src/ATen/new_gen.py
Outdated
assert False, f"unsupported type: {t}" | ||
elif isinstance(t, OptionalType): | ||
# TODO: these arguments are smoothed over by the hacky wrapper | ||
if argument and legacy_optional and str(t.elem) == 'Tensor': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bhosmer suggests factoring out the legacy_optional
logic, perhaps into a wrapper
aten/src/ATen/new_gen.py
Outdated
# should have a signature equivalent to its pure variant, | ||
# but just with extra kwargs for the output elements. This | ||
# is difficult to actually check for and historically | ||
# we only do this check in tools/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the kind of entropy reduction we'd harvest by merging (or sharing utilities across) ATen and tools/ codegen: there's a bunch of logic over there that does semantic checking over the same interpretations of these definitions. Ideally we'd eventually merge that into what you're doing here and use that canonical model everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @ljk53 per our recent convos around codegen
aten/src/ATen/new_gen.py
Outdated
|
||
# A custom loader for YAML to let us also keep track of line numbers | ||
# of each entry in the YAML file | ||
class LineLoader(Loader): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is gold. If the final version of this could save location info into Declarations.yaml it could be used in the tools/ codegen for the same usability bump (although at least some of that code is pretty careful to hand-introduce some context - op names mostly IIRC - into its errors, so it's not awful. But this would be better).
aten/src/ATen/new_gen.py
Outdated
cpp_args.extend(map(format_arg, f.func.out_arguments)) | ||
cpp_args.extend(map(format_arg, f.func.arguments)) | ||
|
||
# Discover TensorOptions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be either more permissive or more proscriptive here? I.e. it might be more user friendly to either accept topt args in any order, or error if they appear out of order, instead of just letting out-of-order args escape the dragnet and cause unexpected behavior downstream.
Aside: I think this is actually a good example of the kind of situation where a codegen pipeline needs a similar attitude to input handling as a compiler frontend - dumb/strict is fine, smart/lenient is fine, but dumb/lenient (i.e. fishing around for patterns and letting misses go by) is an easy way to introduce bad user experience, and IMO a major source of codegen's bad usability rep in general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old codegen is much more proscriptive, requiring specific defaults and optionality before it matches. It is really wordy as a result. I could see this code being a bit too relaxed. I think fundamentally whether or not we grab arguments is a combination of type and name. However, the type is a relaxed test: I don't care if the type is optional or not in schema. If it is defaulted I need to generate defaulting code accordingly for tensor options.
Defo gonna change this.
aten/src/ATen/new_gen.py
Outdated
i += 1 | ||
|
||
rs.append(f"CAFFE2_API {cpp_return} {name}({', '.join(cpp_args)});") | ||
return rs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok I know this is way more than you were looking for but per our chat this afternoon here's a version of compute_function_declarations()
that uses some moves that kept me sane when I rewrote gen_python_functions
way back when.
-
ruthlessly pulling functions up to the top level and plumbing all their dependencies into them as explicit params. The original
compute_function_declarations()
is still pretty small and clear, but reading the dependencies between local functions and closed-over locals is already getting nontrivial... whereas in the flattened version it's super obvious what e.g. formatting an arg depends on, what formatting a tensor options arg depends on (just the function name!) etc. -
analogously, yanking any data constants up into global space and throw all available info into their names and comments. Helps visibility and avoiding drift-prone repetition of special cases buried inside functions... and for the many, many of these that are pattern data for workarounds, it demarcates them more clearly as the eruptions of irregularity that they are 😁
-
being super finicky about missed patterns (here yelling about out-of-sequence tensor options args)
I went ahead and did the whole exercise here just to show how the end result helps (me, anyway) distinguish between the general case and the special cases, as well as what input is being fed into each decision. Well, plus I did it bc I got carried away :P
One last pitch - it might seem like overkill to be so hardcore, now while the code is pretty tight and tractable - but by the time I got to gen_python_functions
some of the analogs to compute_function_declarations()
were like multi-hundred line behemoths with nested and doubly-nested inner functions that just closed over everything, big dictionaries getting tweaked miles and miles away, the whole deal. I think the flatter and more explicitly plumbed we make things now, the easier it'll be to fight that kind of entropy.
# format an argument for c++
def format_arg(a: Argument, legacy_optional: bool) -> str:
# DEFAULTING IS NEW
default = f"={cpp_default(a.default, a.type)}" if a.default is not None else ""
# TODO: Always NO legacy optional
return f"{cpp_type(a.type, mutable=a.is_write, argument=True, legacy_optional=legacy_optional)} {a.name}{default}"
# tensor options attributes come in scattered, we recognize and convert to a single TensorOptions arg
TOPT_NAMES = ['dtype', 'layout', 'device', 'pin_memory']
TOPT_LEN = len(TOPT_NAMES)
# true if i begins the scattered tensor options arguments in args.
# note: it's an error for these to be out of order. (TODO what about noncontiguous?)
def at_topt_args(i: int, args: List[Argument]) -> bool:
in_seq = i <= len(args) - TOPT_LEN and all(args[i+j].name == TOPT_NAMES[j] for j in range(TOPT_LEN))
perm = i <= len(args) - TOPT_LEN and set([a.name for a in args[i:i + TOPT_LEN]]) == set(TOPT_NAMES)
if in_seq != perm:
raise ValueError(f"TensorOptions arguments must be specified in the following order: {', '.join(TOPT_NAMES)}")
return in_seq
# TODO these need TensorOptions without a default for some reason
TENSOR_OPTIONS_NO_DEFAULT = [
"_cudnn_init_dropout_state",
"sparse_coo_tensor.size",
"_sparse_coo_tensor_with_dims",
"_sparse_coo_tensor_with_dims_and_tensors"]
# TODO these need TensorOptions defaulted to (dtype) long
TENSOR_OPTIONS_DEFAULT_LONG = ["tril_indices", "triu_indices"]
# format TensorOptions arg. Handle some special cases by name
def format_topt_arg(f: NativeFunction):
if str(f.func.name) in TENSOR_OPTIONS_NO_DEFAULT:
# I think this is a bug in the original
return 'const TensorOptions & options'
elif str(f.func.name) in TENSOR_OPTIONS_DEFAULT_LONG:
return 'const TensorOptions & options=at::kLong'
else:
return 'const TensorOptions & options={}' # MODIFIED
# compute all c++ function declarations
def compute_function_declarations() -> List[str]:
rs: List[str] = []
for f in native_functions:
with context(f'in {f.loc}:\n {f.func}'):
if f.manual_kernel_registration:
continue
if Variant.function not in f.variants:
continue
# TODO: clear up naming
cpp_return = cpp_type_return(f.func.returns)
name = str(f.func.name.name)
if f.func.is_out_fn():
name += '_out'
cpp_args: List[str] = []
cpp_args.extend(map(lambda a: format_arg(a, not f.use_c10_dispatcher_full), f.func.out_arguments))
cpp_args.extend(map(lambda a: format_arg(a, not f.use_c10_dispatcher_full), f.func.arguments))
# Discover TensorOptions
kwargs = list(f.func.kwarg_only_arguments) # short name
i = 0
while i < len(kwargs):
if at_topt_args(i, kwargs):
cpp_args.append(format_topt_arg(f))
i += len(TOPT_NAMES)
else:
cpp_args.append(format_arg(kwargs[i], not f.use_c10_dispatcher_full))
i += 1
rs.append(f"CAFFE2_API {cpp_return} {name}({', '.join(cpp_args)});")
return rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, pulling things up seems appropriate. One challenge is I need some sort of top level organizational principle (you toss things as local definitions so you don't have to worry about this.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For ordering I generally try to stick to in-order, with comment headers to replicate the visual effect of nesting where it helps. So e.g. the code block above might have a big
# --------- compute_function_declarations and helpers -----------
#
# ...
#
at the top, and maybe a smaller one above the stuff about TensorOptions args.
You're right that it forgoes the natural structuring effect of nested functions, which is a bummer. But the readability payoff of explicit dependencies is definitely worth I think
There is a single literate Python file. Start at the top and start reading from there. This file currently produces a 100% compatible CPUType.h definition. You can tell by running: ``` python aten/src/ATen/new_gen.py | git diff --word-diff --no-index - build/aten/src/ATen/CPUType.h ``` [ci skip] Signed-off-by: Edward Z. Yang <[email protected]> [ghstack-poisoned]
There is a single literate Python file. Start at the top and start reading from there. This file currently produces a 100% compatible CPUType.h definition. You can tell by running: ``` python aten/src/ATen/new_gen.py | git diff --word-diff --no-index - build/aten/src/ATen/CPUType.h ``` [ci skip] Signed-off-by: Edward Z. Yang <[email protected]> ghstack-source-id: 9142ae6993096d49493be2f296c8dcd95d87b8b4 Pull Request resolved: #42629
Check coverage progress with: ``` python -m tools.codegen.gen && git diff --no-index --compact-summary build/aten/src/ATen{_new,}/ ``` [ci skip] Signed-off-by: Edward Z. Yang <[email protected]> [ghstack-poisoned]
Check coverage progress with: ``` python -m tools.codegen.gen && git diff --no-index --compact-summary build/aten/src/ATen{_new,}/ ``` [ci skip] Signed-off-by: Edward Z. Yang <[email protected]> [ghstack-poisoned]
Check coverage progress with: ``` python -m tools.codegen.gen && git diff --no-index --compact-summary build/aten/src/ATen{_new,}/ ``` [ci skip] Signed-off-by: Edward Z. Yang <[email protected]> ghstack-source-id: 934530d599066d93f6844aded8e13a13e997ea69 Pull Request resolved: #42629
Check coverage progress with: ``` python -m tools.codegen.gen && git diff --no-index --compact-summary build/aten/src/ATen{_new,}/ ``` [ci skip] Signed-off-by: Edward Z. Yang <[email protected]> [ghstack-poisoned]
Check coverage progress with: ``` python -m tools.codegen.gen && git diff --no-index --compact-summary build/aten/src/ATen{_new,}/ ``` [ci skip] Signed-off-by: Edward Z. Yang <[email protected]> ghstack-source-id: 6e60e097ee2e94c57ccc626555f30726ea583fab Pull Request resolved: #42629
Check coverage progress with: ``` python -m tools.codegen.gen && git diff --no-index --compact-summary build/aten/src/ATen{_new,}/ ``` [ci skip] Signed-off-by: Edward Z. Yang <[email protected]> [ghstack-poisoned]
Check coverage progress with: ``` python -m tools.codegen.gen && git diff --no-index --compact-summary build/aten/src/ATen{_new,}/ ``` [ci skip] Signed-off-by: Edward Z. Yang <[email protected]> [ghstack-poisoned]
How to approach reviewing this diff: - The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen. - The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`. - All of the inputs to the old codegen are deleted. - Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI. - LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port. I kept the CUDA header in ATen/ to avoid having to fix a bunch of headers. This diff cannot be currently landed as it doesn't reimplement static dispatch. How do we know that this diff is right? We aimed for byte-for-byte modulo whitespace compatibility with the old generated code. Apply the following patch (to remove static dispatch) to the base version of PyTorch: ``` diff --git a/aten/src/ATen/function_wrapper.py b/aten/src/ATen/function_wrapper.py index e26bb3941b..334475212b 100644 --- a/aten/src/ATen/function_wrapper.py +++ b/aten/src/ATen/function_wrapper.py @@ -147,7 +147,6 @@ TENSOR_METHOD_DEFINITION = CodeTemplate("""\ // ${schema_string} ${return_type} Tensor::${api_name}(${method_formals}) const { #ifdef USE_STATIC_DISPATCH - ${static_dispatch_method_body} #else static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::${operator_name}", "${overload_name}") @@ -173,7 +172,6 @@ FUNCTION_DEFINITION = CodeTemplate("""\ // ${schema_string} ${return_type} ${api_name}(${formals}) { #ifdef USE_STATIC_DISPATCH - ${static_dispatch_function_body} #else static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::${operator_name}", "${overload_name}") ``` and then we generate the old and new versions and diff them: ``` {build-old => build}/aten/src/ATen/BackendSelectRegister.cpp | 0 {build-old => build}/aten/src/ATen/CPUType.cpp | 0 {build-old => build}/aten/src/ATen/CUDAType.cpp | 0 {build-old => build}/aten/src/ATen/CUDAType.h | 0 build-old/aten/src/ATen/LegacyTHFunctionsCPU.cpp => /dev/null | 1712 ------------------- build-old/aten/src/ATen/LegacyTHFunctionsCPU.h => /dev/null | 67 - build-old/aten/src/ATen/LegacyTHFunctionsCUDA.cpp => /dev/null | 4176 --------------------------------------------- build-old/aten/src/ATen/LegacyTHFunctionsCUDA.h => /dev/null | 111 -- {build-old => build}/aten/src/ATen/MkldnnCPUType.cpp | 0 {build-old => build}/aten/src/ATen/NativeFunctions.h | 20 +- {build-old => build}/aten/src/ATen/QuantizedCPUType.cpp | 0 {build-old => build}/aten/src/ATen/QuantizedCUDAType.cpp | 0 {build-old => build}/aten/src/ATen/QuantizedCUDAType.h | 0 {build-old => build}/aten/src/ATen/SparseCPUType.cpp | 0 {build-old => build}/aten/src/ATen/SparseCUDAType.cpp | 0 {build-old => build}/aten/src/ATen/SparseCUDAType.h | 0 {build-old => build}/aten/src/ATen/TypeDefault.cpp | 0 {build-old => build}/aten/src/ATen/core/ATenOpList.cpp | 0 ``` The only diff is this: ``` diff --git a/build-old/aten/src/ATen/NativeFunctions.h b/build-new/aten/src/ATen/NativeFunctions.h index a0463dc80d..3808d27824 100644 --- a/build-old/aten/src/ATen/NativeFunctions.h +++ b/build-new/aten/src/ATen/NativeFunctions.h @@ -116,15 +116,15 @@ CAFFE2_API Tensor avg_pool1d(const Tensor & self, IntArrayRef kernel_size, IntAr CAFFE2_API Tensor adaptive_avg_pool1d(const Tensor & self, IntArrayRef output_size); CAFFE2_API std::tuple<Tensor,Tensor> adaptive_max_pool1d(const Tensor & self, IntArrayRef output_size); CAFFE2_API Tensor add(const Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor add_sparse(const Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_(Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_sparse_(Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor add_relu(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_relu_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_relu_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); @@ -639,15 +639,15 @@ CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indi CAFFE2_API std::tuple<Tensor,Tensor> mode(const Tensor & self, Dimname dim, bool keepdim=false); CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indices, const Tensor & self, Dimname dim, bool keepdim=false); CAFFE2_API Tensor mul(const Tensor & self, const Tensor & other); -CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other); CAFFE2_API Tensor mul_sparse(const Tensor & self, const Tensor & other); +CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_(Tensor & self, const Tensor & other); -CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_sparse_(Tensor & self, const Tensor & other); +CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out(Tensor & out, const Tensor & self, const Tensor & other); -CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other); +CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor mul(const Tensor & self, Scalar other); CAFFE2_API Tensor & mul_(Tensor & self, Scalar other); CAFFE2_API Tensor mv(const Tensor & self, const Tensor & vec); @@ -793,8 +793,8 @@ CAFFE2_API Tensor & silu_(Tensor & self); CAFFE2_API Tensor & silu_out(Tensor & out, const Tensor & self); CAFFE2_API Tensor silu_backward(const Tensor & grad_output, const Tensor & self); CAFFE2_API Tensor sigmoid(const Tensor & self); -CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self); CAFFE2_API Tensor sigmoid_quantized_cpu(const Tensor & self); +CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self); CAFFE2_API Tensor & sigmoid_(Tensor & self); CAFFE2_API Tensor & mkldnn_sigmoid_(Tensor & self); CAFFE2_API Tensor & sigmoid_out(Tensor & out, const Tensor & self); @@ -1008,17 +1008,17 @@ CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, bool kee CAFFE2_API Tensor nuclear_norm(const Tensor & self, IntArrayRef dim, bool keepdim=false); CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, IntArrayRef dim, bool keepdim=false); CAFFE2_API Tensor clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); +CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor mkldnn_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor quantized_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); -CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor & resize_as_(Tensor & self, const Tensor & the_template, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor & pow_out(Tensor & out, const Tensor & self, Scalar exponent); CAFFE2_API Tensor & pow_out_sparse_scalar(Tensor & out, const Tensor & self, Scalar exponent); CAFFE2_API Tensor pow(const Tensor & self, Scalar exponent); CAFFE2_API Tensor pow_sparse_scalar(const Tensor & self, Scalar exponent); CAFFE2_API Tensor & zero_(Tensor & self); -CAFFE2_API Tensor & mkldnn_zero_(Tensor & self); CAFFE2_API Tensor & zero_sparse_(Tensor & self); +CAFFE2_API Tensor & mkldnn_zero_(Tensor & self); CAFFE2_API Tensor & sub_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & sub_out_sparse(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor sub(const Tensor & self, const Tensor & other, Scalar alpha=1); @@ -1053,8 +1053,8 @@ CAFFE2_API Tensor & sparse_resize_(Tensor & self, IntArrayRef size, int64_t spar CAFFE2_API Tensor & sparse_resize_and_clear_(Tensor & self, IntArrayRef size, int64_t sparse_dim, int64_t dense_dim); CAFFE2_API Tensor sparse_mask_cpu(const Tensor & self, const Tensor & mask); CAFFE2_API Tensor sparse_mask_cuda(const Tensor & self, const Tensor & mask); -CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self); CAFFE2_API Tensor sparse_to_dense(const Tensor & self); +CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self); CAFFE2_API Tensor to_dense_backward(const Tensor & grad, const Tensor & input); CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self); CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self); ``` These are just wobbles in the order of the declarations; I couldn't be bothered to figure out exactly how the old codegen did the ordering. Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: [D23183978](https://our.internmc.facebook.com/intern/diff/D23183978) [ghstack-poisoned]
How to approach reviewing this diff: - The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen. - The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`. - All of the inputs to the old codegen are deleted. - Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI. - LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port. Signed-off-by: Edward Z. Yang <[email protected]> ghstack-source-id: 0c4c53b1264bff372300febe4bb9b79f5c83a212 Pull Request resolved: #42629
How to approach reviewing this diff: - The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen. - The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`. - All of the inputs to the old codegen are deleted. - Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI. - LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port. I kept the CUDA header in ATen/ to avoid having to fix a bunch of headers. This diff cannot be currently landed as it doesn't reimplement static dispatch. How do we know that this diff is right? We aimed for byte-for-byte modulo whitespace compatibility with the old generated code. Apply the following patch (to remove static dispatch) to the base version of PyTorch: ``` diff --git a/aten/src/ATen/function_wrapper.py b/aten/src/ATen/function_wrapper.py index e26bb3941b..334475212b 100644 --- a/aten/src/ATen/function_wrapper.py +++ b/aten/src/ATen/function_wrapper.py @@ -147,7 +147,6 @@ TENSOR_METHOD_DEFINITION = CodeTemplate("""\ // ${schema_string} ${return_type} Tensor::${api_name}(${method_formals}) const { #ifdef USE_STATIC_DISPATCH - ${static_dispatch_method_body} #else static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::${operator_name}", "${overload_name}") @@ -173,7 +172,6 @@ FUNCTION_DEFINITION = CodeTemplate("""\ // ${schema_string} ${return_type} ${api_name}(${formals}) { #ifdef USE_STATIC_DISPATCH - ${static_dispatch_function_body} #else static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::${operator_name}", "${overload_name}") ``` and then we generate the old and new versions and diff them: ``` {build-old => build}/aten/src/ATen/BackendSelectRegister.cpp | 0 {build-old => build}/aten/src/ATen/CPUType.cpp | 0 {build-old => build}/aten/src/ATen/CUDAType.cpp | 0 {build-old => build}/aten/src/ATen/CUDAType.h | 0 build-old/aten/src/ATen/LegacyTHFunctionsCPU.cpp => /dev/null | 1712 ------------------- build-old/aten/src/ATen/LegacyTHFunctionsCPU.h => /dev/null | 67 - build-old/aten/src/ATen/LegacyTHFunctionsCUDA.cpp => /dev/null | 4176 --------------------------------------------- build-old/aten/src/ATen/LegacyTHFunctionsCUDA.h => /dev/null | 111 -- {build-old => build}/aten/src/ATen/MkldnnCPUType.cpp | 0 {build-old => build}/aten/src/ATen/NativeFunctions.h | 20 +- {build-old => build}/aten/src/ATen/QuantizedCPUType.cpp | 0 {build-old => build}/aten/src/ATen/QuantizedCUDAType.cpp | 0 {build-old => build}/aten/src/ATen/QuantizedCUDAType.h | 0 {build-old => build}/aten/src/ATen/SparseCPUType.cpp | 0 {build-old => build}/aten/src/ATen/SparseCUDAType.cpp | 0 {build-old => build}/aten/src/ATen/SparseCUDAType.h | 0 {build-old => build}/aten/src/ATen/TypeDefault.cpp | 0 {build-old => build}/aten/src/ATen/core/ATenOpList.cpp | 0 ``` The only diff is this: ``` diff --git a/build-old/aten/src/ATen/NativeFunctions.h b/build-new/aten/src/ATen/NativeFunctions.h index a0463dc80d..3808d27824 100644 --- a/build-old/aten/src/ATen/NativeFunctions.h +++ b/build-new/aten/src/ATen/NativeFunctions.h @@ -116,15 +116,15 @@ CAFFE2_API Tensor avg_pool1d(const Tensor & self, IntArrayRef kernel_size, IntAr CAFFE2_API Tensor adaptive_avg_pool1d(const Tensor & self, IntArrayRef output_size); CAFFE2_API std::tuple<Tensor,Tensor> adaptive_max_pool1d(const Tensor & self, IntArrayRef output_size); CAFFE2_API Tensor add(const Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor add_sparse(const Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_(Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_sparse_(Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor add_relu(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_relu_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_relu_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); @@ -639,15 +639,15 @@ CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indi CAFFE2_API std::tuple<Tensor,Tensor> mode(const Tensor & self, Dimname dim, bool keepdim=false); CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indices, const Tensor & self, Dimname dim, bool keepdim=false); CAFFE2_API Tensor mul(const Tensor & self, const Tensor & other); -CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other); CAFFE2_API Tensor mul_sparse(const Tensor & self, const Tensor & other); +CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_(Tensor & self, const Tensor & other); -CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_sparse_(Tensor & self, const Tensor & other); +CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out(Tensor & out, const Tensor & self, const Tensor & other); -CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other); +CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor mul(const Tensor & self, Scalar other); CAFFE2_API Tensor & mul_(Tensor & self, Scalar other); CAFFE2_API Tensor mv(const Tensor & self, const Tensor & vec); @@ -793,8 +793,8 @@ CAFFE2_API Tensor & silu_(Tensor & self); CAFFE2_API Tensor & silu_out(Tensor & out, const Tensor & self); CAFFE2_API Tensor silu_backward(const Tensor & grad_output, const Tensor & self); CAFFE2_API Tensor sigmoid(const Tensor & self); -CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self); CAFFE2_API Tensor sigmoid_quantized_cpu(const Tensor & self); +CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self); CAFFE2_API Tensor & sigmoid_(Tensor & self); CAFFE2_API Tensor & mkldnn_sigmoid_(Tensor & self); CAFFE2_API Tensor & sigmoid_out(Tensor & out, const Tensor & self); @@ -1008,17 +1008,17 @@ CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, bool kee CAFFE2_API Tensor nuclear_norm(const Tensor & self, IntArrayRef dim, bool keepdim=false); CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, IntArrayRef dim, bool keepdim=false); CAFFE2_API Tensor clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); +CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor mkldnn_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor quantized_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); -CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor & resize_as_(Tensor & self, const Tensor & the_template, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor & pow_out(Tensor & out, const Tensor & self, Scalar exponent); CAFFE2_API Tensor & pow_out_sparse_scalar(Tensor & out, const Tensor & self, Scalar exponent); CAFFE2_API Tensor pow(const Tensor & self, Scalar exponent); CAFFE2_API Tensor pow_sparse_scalar(const Tensor & self, Scalar exponent); CAFFE2_API Tensor & zero_(Tensor & self); -CAFFE2_API Tensor & mkldnn_zero_(Tensor & self); CAFFE2_API Tensor & zero_sparse_(Tensor & self); +CAFFE2_API Tensor & mkldnn_zero_(Tensor & self); CAFFE2_API Tensor & sub_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & sub_out_sparse(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor sub(const Tensor & self, const Tensor & other, Scalar alpha=1); @@ -1053,8 +1053,8 @@ CAFFE2_API Tensor & sparse_resize_(Tensor & self, IntArrayRef size, int64_t spar CAFFE2_API Tensor & sparse_resize_and_clear_(Tensor & self, IntArrayRef size, int64_t sparse_dim, int64_t dense_dim); CAFFE2_API Tensor sparse_mask_cpu(const Tensor & self, const Tensor & mask); CAFFE2_API Tensor sparse_mask_cuda(const Tensor & self, const Tensor & mask); -CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self); CAFFE2_API Tensor sparse_to_dense(const Tensor & self); +CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self); CAFFE2_API Tensor to_dense_backward(const Tensor & grad, const Tensor & input); CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self); CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self); ``` These are just wobbles in the order of the declarations; I couldn't be bothered to figure out exactly how the old codegen did the ordering. Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: [D23183978](https://our.internmc.facebook.com/intern/diff/D23183978) [ghstack-poisoned]
How to approach reviewing this diff: - The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen. - The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`. - All of the inputs to the old codegen are deleted. - Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI. - LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port. Signed-off-by: Edward Z. Yang <[email protected]> ghstack-source-id: ce505828e3425dc5e162d94b4a6f0ebeb897c352 Pull Request resolved: #42629
Updated for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice piece of work. A few ideas inline but nothing really pressing.
from contextlib import contextmanager | ||
from typing import Optional, Iterator | ||
|
||
# Simple dynamic scoping implementation. The name "parametrize" comes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm gonna go ahead and be no fun and say I wish we could do without this. It's cute and avoids some plumbing but a) it's way trickier than ideally I'd want a codegen script to be and b) anyway I like plumbing, it makes code look more like what it does and reduces the indirection needed to figure out what's going on.
Imagining what this could turn into over time is probably coloring my reaction, it feels like it makes innocuously adding complexity to the global state uncomfortably easy.
You have a better feel for what it's saved on the writing side, so if you feel like the code-writer-vs-reader cost/benefit works, I wouldn't push too hard on it. But if so, if you can think of a way to make it harder for a bunch of hack_*
flags to quietly accumulate in here, even maybe just a finger-wagging comment, I think that'd be worth doing.
How to approach reviewing this diff: - The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen. - The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`. - All of the inputs to the old codegen are deleted. - Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI. - LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port. I kept the CUDA header in ATen/ to avoid having to fix a bunch of headers. This diff cannot be currently landed as it doesn't reimplement static dispatch. How do we know that this diff is right? We aimed for byte-for-byte modulo whitespace compatibility with the old generated code. Apply the following patch (to remove static dispatch) to the base version of PyTorch: ``` diff --git a/aten/src/ATen/function_wrapper.py b/aten/src/ATen/function_wrapper.py index e26bb3941b..334475212b 100644 --- a/aten/src/ATen/function_wrapper.py +++ b/aten/src/ATen/function_wrapper.py @@ -147,7 +147,6 @@ TENSOR_METHOD_DEFINITION = CodeTemplate("""\ // ${schema_string} ${return_type} Tensor::${api_name}(${method_formals}) const { #ifdef USE_STATIC_DISPATCH - ${static_dispatch_method_body} #else static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::${operator_name}", "${overload_name}") @@ -173,7 +172,6 @@ FUNCTION_DEFINITION = CodeTemplate("""\ // ${schema_string} ${return_type} ${api_name}(${formals}) { #ifdef USE_STATIC_DISPATCH - ${static_dispatch_function_body} #else static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::${operator_name}", "${overload_name}") ``` and then we generate the old and new versions and diff them: ``` {build-old => build}/aten/src/ATen/BackendSelectRegister.cpp | 0 {build-old => build}/aten/src/ATen/CPUType.cpp | 0 {build-old => build}/aten/src/ATen/CUDAType.cpp | 0 {build-old => build}/aten/src/ATen/CUDAType.h | 0 build-old/aten/src/ATen/LegacyTHFunctionsCPU.cpp => /dev/null | 1712 ------------------- build-old/aten/src/ATen/LegacyTHFunctionsCPU.h => /dev/null | 67 - build-old/aten/src/ATen/LegacyTHFunctionsCUDA.cpp => /dev/null | 4176 --------------------------------------------- build-old/aten/src/ATen/LegacyTHFunctionsCUDA.h => /dev/null | 111 -- {build-old => build}/aten/src/ATen/MkldnnCPUType.cpp | 0 {build-old => build}/aten/src/ATen/NativeFunctions.h | 20 +- {build-old => build}/aten/src/ATen/QuantizedCPUType.cpp | 0 {build-old => build}/aten/src/ATen/QuantizedCUDAType.cpp | 0 {build-old => build}/aten/src/ATen/QuantizedCUDAType.h | 0 {build-old => build}/aten/src/ATen/SparseCPUType.cpp | 0 {build-old => build}/aten/src/ATen/SparseCUDAType.cpp | 0 {build-old => build}/aten/src/ATen/SparseCUDAType.h | 0 {build-old => build}/aten/src/ATen/TypeDefault.cpp | 0 {build-old => build}/aten/src/ATen/core/ATenOpList.cpp | 0 ``` The only diff is this: ``` diff --git a/build-old/aten/src/ATen/NativeFunctions.h b/build-new/aten/src/ATen/NativeFunctions.h index a0463dc80d..3808d27824 100644 --- a/build-old/aten/src/ATen/NativeFunctions.h +++ b/build-new/aten/src/ATen/NativeFunctions.h @@ -116,15 +116,15 @@ CAFFE2_API Tensor avg_pool1d(const Tensor & self, IntArrayRef kernel_size, IntAr CAFFE2_API Tensor adaptive_avg_pool1d(const Tensor & self, IntArrayRef output_size); CAFFE2_API std::tuple<Tensor,Tensor> adaptive_max_pool1d(const Tensor & self, IntArrayRef output_size); CAFFE2_API Tensor add(const Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor add_sparse(const Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_(Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_sparse_(Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor add_relu(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_relu_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_relu_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); @@ -639,15 +639,15 @@ CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indi CAFFE2_API std::tuple<Tensor,Tensor> mode(const Tensor & self, Dimname dim, bool keepdim=false); CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indices, const Tensor & self, Dimname dim, bool keepdim=false); CAFFE2_API Tensor mul(const Tensor & self, const Tensor & other); -CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other); CAFFE2_API Tensor mul_sparse(const Tensor & self, const Tensor & other); +CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_(Tensor & self, const Tensor & other); -CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_sparse_(Tensor & self, const Tensor & other); +CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out(Tensor & out, const Tensor & self, const Tensor & other); -CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other); +CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor mul(const Tensor & self, Scalar other); CAFFE2_API Tensor & mul_(Tensor & self, Scalar other); CAFFE2_API Tensor mv(const Tensor & self, const Tensor & vec); @@ -793,8 +793,8 @@ CAFFE2_API Tensor & silu_(Tensor & self); CAFFE2_API Tensor & silu_out(Tensor & out, const Tensor & self); CAFFE2_API Tensor silu_backward(const Tensor & grad_output, const Tensor & self); CAFFE2_API Tensor sigmoid(const Tensor & self); -CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self); CAFFE2_API Tensor sigmoid_quantized_cpu(const Tensor & self); +CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self); CAFFE2_API Tensor & sigmoid_(Tensor & self); CAFFE2_API Tensor & mkldnn_sigmoid_(Tensor & self); CAFFE2_API Tensor & sigmoid_out(Tensor & out, const Tensor & self); @@ -1008,17 +1008,17 @@ CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, bool kee CAFFE2_API Tensor nuclear_norm(const Tensor & self, IntArrayRef dim, bool keepdim=false); CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, IntArrayRef dim, bool keepdim=false); CAFFE2_API Tensor clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); +CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor mkldnn_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor quantized_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); -CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor & resize_as_(Tensor & self, const Tensor & the_template, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor & pow_out(Tensor & out, const Tensor & self, Scalar exponent); CAFFE2_API Tensor & pow_out_sparse_scalar(Tensor & out, const Tensor & self, Scalar exponent); CAFFE2_API Tensor pow(const Tensor & self, Scalar exponent); CAFFE2_API Tensor pow_sparse_scalar(const Tensor & self, Scalar exponent); CAFFE2_API Tensor & zero_(Tensor & self); -CAFFE2_API Tensor & mkldnn_zero_(Tensor & self); CAFFE2_API Tensor & zero_sparse_(Tensor & self); +CAFFE2_API Tensor & mkldnn_zero_(Tensor & self); CAFFE2_API Tensor & sub_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & sub_out_sparse(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor sub(const Tensor & self, const Tensor & other, Scalar alpha=1); @@ -1053,8 +1053,8 @@ CAFFE2_API Tensor & sparse_resize_(Tensor & self, IntArrayRef size, int64_t spar CAFFE2_API Tensor & sparse_resize_and_clear_(Tensor & self, IntArrayRef size, int64_t sparse_dim, int64_t dense_dim); CAFFE2_API Tensor sparse_mask_cpu(const Tensor & self, const Tensor & mask); CAFFE2_API Tensor sparse_mask_cuda(const Tensor & self, const Tensor & mask); -CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self); CAFFE2_API Tensor sparse_to_dense(const Tensor & self); +CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self); CAFFE2_API Tensor to_dense_backward(const Tensor & grad, const Tensor & input); CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self); CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self); ``` These are just wobbles in the order of the declarations; I couldn't be bothered to figure out exactly how the old codegen did the ordering. Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: [D23183978](https://our.internmc.facebook.com/intern/diff/D23183978) [ghstack-poisoned]
How to approach reviewing this diff: - The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen. - The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`. - All of the inputs to the old codegen are deleted. - Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI. - LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port. Signed-off-by: Edward Z. Yang <[email protected]> ghstack-source-id: bddc3c8107ce200cca339806c6eea6fd368744ed Pull Request resolved: #42629
How to approach reviewing this diff: - The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen. - The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`. - All of the inputs to the old codegen are deleted. - Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI. - LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port. I kept the CUDA header in ATen/ to avoid having to fix a bunch of headers. This diff cannot be currently landed as it doesn't reimplement static dispatch. How do we know that this diff is right? We aimed for byte-for-byte modulo whitespace compatibility with the old generated code. Apply the following patch (to remove static dispatch) to the base version of PyTorch: ``` diff --git a/aten/src/ATen/function_wrapper.py b/aten/src/ATen/function_wrapper.py index e26bb3941b..334475212b 100644 --- a/aten/src/ATen/function_wrapper.py +++ b/aten/src/ATen/function_wrapper.py @@ -147,7 +147,6 @@ TENSOR_METHOD_DEFINITION = CodeTemplate("""\ // ${schema_string} ${return_type} Tensor::${api_name}(${method_formals}) const { #ifdef USE_STATIC_DISPATCH - ${static_dispatch_method_body} #else static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::${operator_name}", "${overload_name}") @@ -173,7 +172,6 @@ FUNCTION_DEFINITION = CodeTemplate("""\ // ${schema_string} ${return_type} ${api_name}(${formals}) { #ifdef USE_STATIC_DISPATCH - ${static_dispatch_function_body} #else static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::${operator_name}", "${overload_name}") ``` and then we generate the old and new versions and diff them: ``` {build-old => build}/aten/src/ATen/BackendSelectRegister.cpp | 0 {build-old => build}/aten/src/ATen/CPUType.cpp | 0 {build-old => build}/aten/src/ATen/CUDAType.cpp | 0 {build-old => build}/aten/src/ATen/CUDAType.h | 0 build-old/aten/src/ATen/LegacyTHFunctionsCPU.cpp => /dev/null | 1712 ------------------- build-old/aten/src/ATen/LegacyTHFunctionsCPU.h => /dev/null | 67 - build-old/aten/src/ATen/LegacyTHFunctionsCUDA.cpp => /dev/null | 4176 --------------------------------------------- build-old/aten/src/ATen/LegacyTHFunctionsCUDA.h => /dev/null | 111 -- {build-old => build}/aten/src/ATen/MkldnnCPUType.cpp | 0 {build-old => build}/aten/src/ATen/NativeFunctions.h | 20 +- {build-old => build}/aten/src/ATen/QuantizedCPUType.cpp | 0 {build-old => build}/aten/src/ATen/QuantizedCUDAType.cpp | 0 {build-old => build}/aten/src/ATen/QuantizedCUDAType.h | 0 {build-old => build}/aten/src/ATen/SparseCPUType.cpp | 0 {build-old => build}/aten/src/ATen/SparseCUDAType.cpp | 0 {build-old => build}/aten/src/ATen/SparseCUDAType.h | 0 {build-old => build}/aten/src/ATen/TypeDefault.cpp | 0 {build-old => build}/aten/src/ATen/core/ATenOpList.cpp | 0 ``` The only diff is this: ``` diff --git a/build-old/aten/src/ATen/NativeFunctions.h b/build-new/aten/src/ATen/NativeFunctions.h index a0463dc80d..3808d27824 100644 --- a/build-old/aten/src/ATen/NativeFunctions.h +++ b/build-new/aten/src/ATen/NativeFunctions.h @@ -116,15 +116,15 @@ CAFFE2_API Tensor avg_pool1d(const Tensor & self, IntArrayRef kernel_size, IntAr CAFFE2_API Tensor adaptive_avg_pool1d(const Tensor & self, IntArrayRef output_size); CAFFE2_API std::tuple<Tensor,Tensor> adaptive_max_pool1d(const Tensor & self, IntArrayRef output_size); CAFFE2_API Tensor add(const Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor add_sparse(const Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_(Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_sparse_(Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor add_relu(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_relu_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_relu_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); @@ -639,15 +639,15 @@ CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indi CAFFE2_API std::tuple<Tensor,Tensor> mode(const Tensor & self, Dimname dim, bool keepdim=false); CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indices, const Tensor & self, Dimname dim, bool keepdim=false); CAFFE2_API Tensor mul(const Tensor & self, const Tensor & other); -CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other); CAFFE2_API Tensor mul_sparse(const Tensor & self, const Tensor & other); +CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_(Tensor & self, const Tensor & other); -CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_sparse_(Tensor & self, const Tensor & other); +CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out(Tensor & out, const Tensor & self, const Tensor & other); -CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other); +CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor mul(const Tensor & self, Scalar other); CAFFE2_API Tensor & mul_(Tensor & self, Scalar other); CAFFE2_API Tensor mv(const Tensor & self, const Tensor & vec); @@ -793,8 +793,8 @@ CAFFE2_API Tensor & silu_(Tensor & self); CAFFE2_API Tensor & silu_out(Tensor & out, const Tensor & self); CAFFE2_API Tensor silu_backward(const Tensor & grad_output, const Tensor & self); CAFFE2_API Tensor sigmoid(const Tensor & self); -CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self); CAFFE2_API Tensor sigmoid_quantized_cpu(const Tensor & self); +CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self); CAFFE2_API Tensor & sigmoid_(Tensor & self); CAFFE2_API Tensor & mkldnn_sigmoid_(Tensor & self); CAFFE2_API Tensor & sigmoid_out(Tensor & out, const Tensor & self); @@ -1008,17 +1008,17 @@ CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, bool kee CAFFE2_API Tensor nuclear_norm(const Tensor & self, IntArrayRef dim, bool keepdim=false); CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, IntArrayRef dim, bool keepdim=false); CAFFE2_API Tensor clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); +CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor mkldnn_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor quantized_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); -CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor & resize_as_(Tensor & self, const Tensor & the_template, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor & pow_out(Tensor & out, const Tensor & self, Scalar exponent); CAFFE2_API Tensor & pow_out_sparse_scalar(Tensor & out, const Tensor & self, Scalar exponent); CAFFE2_API Tensor pow(const Tensor & self, Scalar exponent); CAFFE2_API Tensor pow_sparse_scalar(const Tensor & self, Scalar exponent); CAFFE2_API Tensor & zero_(Tensor & self); -CAFFE2_API Tensor & mkldnn_zero_(Tensor & self); CAFFE2_API Tensor & zero_sparse_(Tensor & self); +CAFFE2_API Tensor & mkldnn_zero_(Tensor & self); CAFFE2_API Tensor & sub_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & sub_out_sparse(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor sub(const Tensor & self, const Tensor & other, Scalar alpha=1); @@ -1053,8 +1053,8 @@ CAFFE2_API Tensor & sparse_resize_(Tensor & self, IntArrayRef size, int64_t spar CAFFE2_API Tensor & sparse_resize_and_clear_(Tensor & self, IntArrayRef size, int64_t sparse_dim, int64_t dense_dim); CAFFE2_API Tensor sparse_mask_cpu(const Tensor & self, const Tensor & mask); CAFFE2_API Tensor sparse_mask_cuda(const Tensor & self, const Tensor & mask); -CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self); CAFFE2_API Tensor sparse_to_dense(const Tensor & self); +CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self); CAFFE2_API Tensor to_dense_backward(const Tensor & grad, const Tensor & input); CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self); CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self); ``` These are just wobbles in the order of the declarations; I couldn't be bothered to figure out exactly how the old codegen did the ordering. Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: [D23183978](https://our.internmc.facebook.com/intern/diff/D23183978) [ghstack-poisoned]
How to approach reviewing this diff: - The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen. - The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`. - All of the inputs to the old codegen are deleted. - Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI. - LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port. I kept the CUDA header in ATen/ to avoid having to fix a bunch of headers. This diff cannot be currently landed as it doesn't reimplement static dispatch. How do we know that this diff is right? We aimed for byte-for-byte modulo whitespace compatibility with the old generated code. Apply the following patch (to remove static dispatch) to the base version of PyTorch: ``` diff --git a/aten/src/ATen/function_wrapper.py b/aten/src/ATen/function_wrapper.py index e26bb3941b..334475212b 100644 --- a/aten/src/ATen/function_wrapper.py +++ b/aten/src/ATen/function_wrapper.py @@ -147,7 +147,6 @@ TENSOR_METHOD_DEFINITION = CodeTemplate("""\ // ${schema_string} ${return_type} Tensor::${api_name}(${method_formals}) const { #ifdef USE_STATIC_DISPATCH - ${static_dispatch_method_body} #else static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::${operator_name}", "${overload_name}") @@ -173,7 +172,6 @@ FUNCTION_DEFINITION = CodeTemplate("""\ // ${schema_string} ${return_type} ${api_name}(${formals}) { #ifdef USE_STATIC_DISPATCH - ${static_dispatch_function_body} #else static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::${operator_name}", "${overload_name}") ``` and then we generate the old and new versions and diff them: ``` {build-old => build}/aten/src/ATen/BackendSelectRegister.cpp | 0 {build-old => build}/aten/src/ATen/CPUType.cpp | 0 {build-old => build}/aten/src/ATen/CUDAType.cpp | 0 {build-old => build}/aten/src/ATen/CUDAType.h | 0 build-old/aten/src/ATen/LegacyTHFunctionsCPU.cpp => /dev/null | 1712 ------------------- build-old/aten/src/ATen/LegacyTHFunctionsCPU.h => /dev/null | 67 - build-old/aten/src/ATen/LegacyTHFunctionsCUDA.cpp => /dev/null | 4176 --------------------------------------------- build-old/aten/src/ATen/LegacyTHFunctionsCUDA.h => /dev/null | 111 -- {build-old => build}/aten/src/ATen/MkldnnCPUType.cpp | 0 {build-old => build}/aten/src/ATen/NativeFunctions.h | 20 +- {build-old => build}/aten/src/ATen/QuantizedCPUType.cpp | 0 {build-old => build}/aten/src/ATen/QuantizedCUDAType.cpp | 0 {build-old => build}/aten/src/ATen/QuantizedCUDAType.h | 0 {build-old => build}/aten/src/ATen/SparseCPUType.cpp | 0 {build-old => build}/aten/src/ATen/SparseCUDAType.cpp | 0 {build-old => build}/aten/src/ATen/SparseCUDAType.h | 0 {build-old => build}/aten/src/ATen/TypeDefault.cpp | 0 {build-old => build}/aten/src/ATen/core/ATenOpList.cpp | 0 ``` The only diff is this: ``` diff --git a/build-old/aten/src/ATen/NativeFunctions.h b/build-new/aten/src/ATen/NativeFunctions.h index a0463dc80d..3808d27824 100644 --- a/build-old/aten/src/ATen/NativeFunctions.h +++ b/build-new/aten/src/ATen/NativeFunctions.h @@ -116,15 +116,15 @@ CAFFE2_API Tensor avg_pool1d(const Tensor & self, IntArrayRef kernel_size, IntAr CAFFE2_API Tensor adaptive_avg_pool1d(const Tensor & self, IntArrayRef output_size); CAFFE2_API std::tuple<Tensor,Tensor> adaptive_max_pool1d(const Tensor & self, IntArrayRef output_size); CAFFE2_API Tensor add(const Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor add_sparse(const Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_(Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_sparse_(Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor add_relu(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_relu_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_relu_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); @@ -639,15 +639,15 @@ CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indi CAFFE2_API std::tuple<Tensor,Tensor> mode(const Tensor & self, Dimname dim, bool keepdim=false); CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indices, const Tensor & self, Dimname dim, bool keepdim=false); CAFFE2_API Tensor mul(const Tensor & self, const Tensor & other); -CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other); CAFFE2_API Tensor mul_sparse(const Tensor & self, const Tensor & other); +CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_(Tensor & self, const Tensor & other); -CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_sparse_(Tensor & self, const Tensor & other); +CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out(Tensor & out, const Tensor & self, const Tensor & other); -CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other); +CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor mul(const Tensor & self, Scalar other); CAFFE2_API Tensor & mul_(Tensor & self, Scalar other); CAFFE2_API Tensor mv(const Tensor & self, const Tensor & vec); @@ -793,8 +793,8 @@ CAFFE2_API Tensor & silu_(Tensor & self); CAFFE2_API Tensor & silu_out(Tensor & out, const Tensor & self); CAFFE2_API Tensor silu_backward(const Tensor & grad_output, const Tensor & self); CAFFE2_API Tensor sigmoid(const Tensor & self); -CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self); CAFFE2_API Tensor sigmoid_quantized_cpu(const Tensor & self); +CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self); CAFFE2_API Tensor & sigmoid_(Tensor & self); CAFFE2_API Tensor & mkldnn_sigmoid_(Tensor & self); CAFFE2_API Tensor & sigmoid_out(Tensor & out, const Tensor & self); @@ -1008,17 +1008,17 @@ CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, bool kee CAFFE2_API Tensor nuclear_norm(const Tensor & self, IntArrayRef dim, bool keepdim=false); CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, IntArrayRef dim, bool keepdim=false); CAFFE2_API Tensor clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); +CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor mkldnn_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor quantized_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); -CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor & resize_as_(Tensor & self, const Tensor & the_template, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor & pow_out(Tensor & out, const Tensor & self, Scalar exponent); CAFFE2_API Tensor & pow_out_sparse_scalar(Tensor & out, const Tensor & self, Scalar exponent); CAFFE2_API Tensor pow(const Tensor & self, Scalar exponent); CAFFE2_API Tensor pow_sparse_scalar(const Tensor & self, Scalar exponent); CAFFE2_API Tensor & zero_(Tensor & self); -CAFFE2_API Tensor & mkldnn_zero_(Tensor & self); CAFFE2_API Tensor & zero_sparse_(Tensor & self); +CAFFE2_API Tensor & mkldnn_zero_(Tensor & self); CAFFE2_API Tensor & sub_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & sub_out_sparse(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor sub(const Tensor & self, const Tensor & other, Scalar alpha=1); @@ -1053,8 +1053,8 @@ CAFFE2_API Tensor & sparse_resize_(Tensor & self, IntArrayRef size, int64_t spar CAFFE2_API Tensor & sparse_resize_and_clear_(Tensor & self, IntArrayRef size, int64_t sparse_dim, int64_t dense_dim); CAFFE2_API Tensor sparse_mask_cpu(const Tensor & self, const Tensor & mask); CAFFE2_API Tensor sparse_mask_cuda(const Tensor & self, const Tensor & mask); -CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self); CAFFE2_API Tensor sparse_to_dense(const Tensor & self); +CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self); CAFFE2_API Tensor to_dense_backward(const Tensor & grad, const Tensor & input); CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self); CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self); ``` These are just wobbles in the order of the declarations; I couldn't be bothered to figure out exactly how the old codegen did the ordering. Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: [D23183978](https://our.internmc.facebook.com/intern/diff/D23183978) [ghstack-poisoned]
How to approach reviewing this diff: - The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen. - The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`. - All of the inputs to the old codegen are deleted. - Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI. - LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port. I kept the CUDA header in ATen/ to avoid having to fix a bunch of headers. This diff cannot be currently landed as it doesn't reimplement static dispatch. How do we know that this diff is right? We aimed for byte-for-byte modulo whitespace compatibility with the old generated code. Apply the following patch (to remove static dispatch) to the base version of PyTorch: ``` diff --git a/aten/src/ATen/function_wrapper.py b/aten/src/ATen/function_wrapper.py index e26bb3941b..334475212b 100644 --- a/aten/src/ATen/function_wrapper.py +++ b/aten/src/ATen/function_wrapper.py @@ -147,7 +147,6 @@ TENSOR_METHOD_DEFINITION = CodeTemplate("""\ // ${schema_string} ${return_type} Tensor::${api_name}(${method_formals}) const { #ifdef USE_STATIC_DISPATCH - ${static_dispatch_method_body} #else static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::${operator_name}", "${overload_name}") @@ -173,7 +172,6 @@ FUNCTION_DEFINITION = CodeTemplate("""\ // ${schema_string} ${return_type} ${api_name}(${formals}) { #ifdef USE_STATIC_DISPATCH - ${static_dispatch_function_body} #else static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::${operator_name}", "${overload_name}") ``` and then we generate the old and new versions and diff them: ``` {build-old => build}/aten/src/ATen/BackendSelectRegister.cpp | 0 {build-old => build}/aten/src/ATen/CPUType.cpp | 0 {build-old => build}/aten/src/ATen/CUDAType.cpp | 0 {build-old => build}/aten/src/ATen/CUDAType.h | 0 build-old/aten/src/ATen/LegacyTHFunctionsCPU.cpp => /dev/null | 1712 ------------------- build-old/aten/src/ATen/LegacyTHFunctionsCPU.h => /dev/null | 67 - build-old/aten/src/ATen/LegacyTHFunctionsCUDA.cpp => /dev/null | 4176 --------------------------------------------- build-old/aten/src/ATen/LegacyTHFunctionsCUDA.h => /dev/null | 111 -- {build-old => build}/aten/src/ATen/MkldnnCPUType.cpp | 0 {build-old => build}/aten/src/ATen/NativeFunctions.h | 20 +- {build-old => build}/aten/src/ATen/QuantizedCPUType.cpp | 0 {build-old => build}/aten/src/ATen/QuantizedCUDAType.cpp | 0 {build-old => build}/aten/src/ATen/QuantizedCUDAType.h | 0 {build-old => build}/aten/src/ATen/SparseCPUType.cpp | 0 {build-old => build}/aten/src/ATen/SparseCUDAType.cpp | 0 {build-old => build}/aten/src/ATen/SparseCUDAType.h | 0 {build-old => build}/aten/src/ATen/TypeDefault.cpp | 0 {build-old => build}/aten/src/ATen/core/ATenOpList.cpp | 0 ``` The only diff is this: ``` diff --git a/build-old/aten/src/ATen/NativeFunctions.h b/build-new/aten/src/ATen/NativeFunctions.h index a0463dc80d..3808d27824 100644 --- a/build-old/aten/src/ATen/NativeFunctions.h +++ b/build-new/aten/src/ATen/NativeFunctions.h @@ -116,15 +116,15 @@ CAFFE2_API Tensor avg_pool1d(const Tensor & self, IntArrayRef kernel_size, IntAr CAFFE2_API Tensor adaptive_avg_pool1d(const Tensor & self, IntArrayRef output_size); CAFFE2_API std::tuple<Tensor,Tensor> adaptive_max_pool1d(const Tensor & self, IntArrayRef output_size); CAFFE2_API Tensor add(const Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor add_sparse(const Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_(Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_sparse_(Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor add_relu(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_relu_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_relu_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); @@ -639,15 +639,15 @@ CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indi CAFFE2_API std::tuple<Tensor,Tensor> mode(const Tensor & self, Dimname dim, bool keepdim=false); CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indices, const Tensor & self, Dimname dim, bool keepdim=false); CAFFE2_API Tensor mul(const Tensor & self, const Tensor & other); -CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other); CAFFE2_API Tensor mul_sparse(const Tensor & self, const Tensor & other); +CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_(Tensor & self, const Tensor & other); -CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_sparse_(Tensor & self, const Tensor & other); +CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out(Tensor & out, const Tensor & self, const Tensor & other); -CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other); +CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor mul(const Tensor & self, Scalar other); CAFFE2_API Tensor & mul_(Tensor & self, Scalar other); CAFFE2_API Tensor mv(const Tensor & self, const Tensor & vec); @@ -793,8 +793,8 @@ CAFFE2_API Tensor & silu_(Tensor & self); CAFFE2_API Tensor & silu_out(Tensor & out, const Tensor & self); CAFFE2_API Tensor silu_backward(const Tensor & grad_output, const Tensor & self); CAFFE2_API Tensor sigmoid(const Tensor & self); -CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self); CAFFE2_API Tensor sigmoid_quantized_cpu(const Tensor & self); +CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self); CAFFE2_API Tensor & sigmoid_(Tensor & self); CAFFE2_API Tensor & mkldnn_sigmoid_(Tensor & self); CAFFE2_API Tensor & sigmoid_out(Tensor & out, const Tensor & self); @@ -1008,17 +1008,17 @@ CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, bool kee CAFFE2_API Tensor nuclear_norm(const Tensor & self, IntArrayRef dim, bool keepdim=false); CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, IntArrayRef dim, bool keepdim=false); CAFFE2_API Tensor clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); +CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor mkldnn_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor quantized_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); -CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor & resize_as_(Tensor & self, const Tensor & the_template, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor & pow_out(Tensor & out, const Tensor & self, Scalar exponent); CAFFE2_API Tensor & pow_out_sparse_scalar(Tensor & out, const Tensor & self, Scalar exponent); CAFFE2_API Tensor pow(const Tensor & self, Scalar exponent); CAFFE2_API Tensor pow_sparse_scalar(const Tensor & self, Scalar exponent); CAFFE2_API Tensor & zero_(Tensor & self); -CAFFE2_API Tensor & mkldnn_zero_(Tensor & self); CAFFE2_API Tensor & zero_sparse_(Tensor & self); +CAFFE2_API Tensor & mkldnn_zero_(Tensor & self); CAFFE2_API Tensor & sub_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & sub_out_sparse(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor sub(const Tensor & self, const Tensor & other, Scalar alpha=1); @@ -1053,8 +1053,8 @@ CAFFE2_API Tensor & sparse_resize_(Tensor & self, IntArrayRef size, int64_t spar CAFFE2_API Tensor & sparse_resize_and_clear_(Tensor & self, IntArrayRef size, int64_t sparse_dim, int64_t dense_dim); CAFFE2_API Tensor sparse_mask_cpu(const Tensor & self, const Tensor & mask); CAFFE2_API Tensor sparse_mask_cuda(const Tensor & self, const Tensor & mask); -CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self); CAFFE2_API Tensor sparse_to_dense(const Tensor & self); +CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self); CAFFE2_API Tensor to_dense_backward(const Tensor & grad, const Tensor & input); CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self); CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self); ``` These are just wobbles in the order of the declarations; I couldn't be bothered to figure out exactly how the old codegen did the ordering. Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: [D23183978](https://our.internmc.facebook.com/intern/diff/D23183978) [ghstack-poisoned]
How to approach reviewing this diff: - The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen. - The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`. - All of the inputs to the old codegen are deleted. - Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI. - LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port. Signed-off-by: Edward Z. Yang <[email protected]> ghstack-source-id: 5203f12d2cd98c740e57a5c479e0014121351c65 Pull Request resolved: #42629
How to approach reviewing this diff: - The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen. - The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`. - All of the inputs to the old codegen are deleted. - Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI. - LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port. I kept the CUDA header in ATen/ to avoid having to fix a bunch of headers. This diff cannot be currently landed as it doesn't reimplement static dispatch. How do we know that this diff is right? We aimed for byte-for-byte modulo whitespace compatibility with the old generated code. Apply the following patch (to remove static dispatch) to the base version of PyTorch: ``` diff --git a/aten/src/ATen/function_wrapper.py b/aten/src/ATen/function_wrapper.py index e26bb3941b..334475212b 100644 --- a/aten/src/ATen/function_wrapper.py +++ b/aten/src/ATen/function_wrapper.py @@ -147,7 +147,6 @@ TENSOR_METHOD_DEFINITION = CodeTemplate("""\ // ${schema_string} ${return_type} Tensor::${api_name}(${method_formals}) const { #ifdef USE_STATIC_DISPATCH - ${static_dispatch_method_body} #else static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::${operator_name}", "${overload_name}") @@ -173,7 +172,6 @@ FUNCTION_DEFINITION = CodeTemplate("""\ // ${schema_string} ${return_type} ${api_name}(${formals}) { #ifdef USE_STATIC_DISPATCH - ${static_dispatch_function_body} #else static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::${operator_name}", "${overload_name}") ``` and then we generate the old and new versions and diff them: ``` {build-old => build}/aten/src/ATen/BackendSelectRegister.cpp | 0 {build-old => build}/aten/src/ATen/CPUType.cpp | 0 {build-old => build}/aten/src/ATen/CUDAType.cpp | 0 {build-old => build}/aten/src/ATen/CUDAType.h | 0 build-old/aten/src/ATen/LegacyTHFunctionsCPU.cpp => /dev/null | 1712 ------------------- build-old/aten/src/ATen/LegacyTHFunctionsCPU.h => /dev/null | 67 - build-old/aten/src/ATen/LegacyTHFunctionsCUDA.cpp => /dev/null | 4176 --------------------------------------------- build-old/aten/src/ATen/LegacyTHFunctionsCUDA.h => /dev/null | 111 -- {build-old => build}/aten/src/ATen/MkldnnCPUType.cpp | 0 {build-old => build}/aten/src/ATen/NativeFunctions.h | 20 +- {build-old => build}/aten/src/ATen/QuantizedCPUType.cpp | 0 {build-old => build}/aten/src/ATen/QuantizedCUDAType.cpp | 0 {build-old => build}/aten/src/ATen/QuantizedCUDAType.h | 0 {build-old => build}/aten/src/ATen/SparseCPUType.cpp | 0 {build-old => build}/aten/src/ATen/SparseCUDAType.cpp | 0 {build-old => build}/aten/src/ATen/SparseCUDAType.h | 0 {build-old => build}/aten/src/ATen/TypeDefault.cpp | 0 {build-old => build}/aten/src/ATen/core/ATenOpList.cpp | 0 ``` The only diff is this: ``` diff --git a/build-old/aten/src/ATen/NativeFunctions.h b/build-new/aten/src/ATen/NativeFunctions.h index a0463dc80d..3808d27824 100644 --- a/build-old/aten/src/ATen/NativeFunctions.h +++ b/build-new/aten/src/ATen/NativeFunctions.h @@ -116,15 +116,15 @@ CAFFE2_API Tensor avg_pool1d(const Tensor & self, IntArrayRef kernel_size, IntAr CAFFE2_API Tensor adaptive_avg_pool1d(const Tensor & self, IntArrayRef output_size); CAFFE2_API std::tuple<Tensor,Tensor> adaptive_max_pool1d(const Tensor & self, IntArrayRef output_size); CAFFE2_API Tensor add(const Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor add_sparse(const Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_(Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_sparse_(Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor add_relu(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_relu_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_relu_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); @@ -639,15 +639,15 @@ CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indi CAFFE2_API std::tuple<Tensor,Tensor> mode(const Tensor & self, Dimname dim, bool keepdim=false); CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indices, const Tensor & self, Dimname dim, bool keepdim=false); CAFFE2_API Tensor mul(const Tensor & self, const Tensor & other); -CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other); CAFFE2_API Tensor mul_sparse(const Tensor & self, const Tensor & other); +CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_(Tensor & self, const Tensor & other); -CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_sparse_(Tensor & self, const Tensor & other); +CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out(Tensor & out, const Tensor & self, const Tensor & other); -CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other); +CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor mul(const Tensor & self, Scalar other); CAFFE2_API Tensor & mul_(Tensor & self, Scalar other); CAFFE2_API Tensor mv(const Tensor & self, const Tensor & vec); @@ -793,8 +793,8 @@ CAFFE2_API Tensor & silu_(Tensor & self); CAFFE2_API Tensor & silu_out(Tensor & out, const Tensor & self); CAFFE2_API Tensor silu_backward(const Tensor & grad_output, const Tensor & self); CAFFE2_API Tensor sigmoid(const Tensor & self); -CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self); CAFFE2_API Tensor sigmoid_quantized_cpu(const Tensor & self); +CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self); CAFFE2_API Tensor & sigmoid_(Tensor & self); CAFFE2_API Tensor & mkldnn_sigmoid_(Tensor & self); CAFFE2_API Tensor & sigmoid_out(Tensor & out, const Tensor & self); @@ -1008,17 +1008,17 @@ CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, bool kee CAFFE2_API Tensor nuclear_norm(const Tensor & self, IntArrayRef dim, bool keepdim=false); CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, IntArrayRef dim, bool keepdim=false); CAFFE2_API Tensor clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); +CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor mkldnn_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor quantized_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); -CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor & resize_as_(Tensor & self, const Tensor & the_template, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor & pow_out(Tensor & out, const Tensor & self, Scalar exponent); CAFFE2_API Tensor & pow_out_sparse_scalar(Tensor & out, const Tensor & self, Scalar exponent); CAFFE2_API Tensor pow(const Tensor & self, Scalar exponent); CAFFE2_API Tensor pow_sparse_scalar(const Tensor & self, Scalar exponent); CAFFE2_API Tensor & zero_(Tensor & self); -CAFFE2_API Tensor & mkldnn_zero_(Tensor & self); CAFFE2_API Tensor & zero_sparse_(Tensor & self); +CAFFE2_API Tensor & mkldnn_zero_(Tensor & self); CAFFE2_API Tensor & sub_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & sub_out_sparse(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor sub(const Tensor & self, const Tensor & other, Scalar alpha=1); @@ -1053,8 +1053,8 @@ CAFFE2_API Tensor & sparse_resize_(Tensor & self, IntArrayRef size, int64_t spar CAFFE2_API Tensor & sparse_resize_and_clear_(Tensor & self, IntArrayRef size, int64_t sparse_dim, int64_t dense_dim); CAFFE2_API Tensor sparse_mask_cpu(const Tensor & self, const Tensor & mask); CAFFE2_API Tensor sparse_mask_cuda(const Tensor & self, const Tensor & mask); -CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self); CAFFE2_API Tensor sparse_to_dense(const Tensor & self); +CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self); CAFFE2_API Tensor to_dense_backward(const Tensor & grad, const Tensor & input); CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self); CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self); ``` These are just wobbles in the order of the declarations; I couldn't be bothered to figure out exactly how the old codegen did the ordering. Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: [D23183978](https://our.internmc.facebook.com/intern/diff/D23183978) [ghstack-poisoned]
How to approach reviewing this diff: - The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen. - The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`. - All of the inputs to the old codegen are deleted. - Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI. - LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port. I kept the CUDA header in ATen/ to avoid having to fix a bunch of headers. This diff cannot be currently landed as it doesn't reimplement static dispatch. How do we know that this diff is right? We aimed for byte-for-byte modulo whitespace compatibility with the old generated code. Apply the following patch (to remove static dispatch) to the base version of PyTorch: ``` diff --git a/aten/src/ATen/function_wrapper.py b/aten/src/ATen/function_wrapper.py index e26bb3941b..334475212b 100644 --- a/aten/src/ATen/function_wrapper.py +++ b/aten/src/ATen/function_wrapper.py @@ -147,7 +147,6 @@ TENSOR_METHOD_DEFINITION = CodeTemplate("""\ // ${schema_string} ${return_type} Tensor::${api_name}(${method_formals}) const { #ifdef USE_STATIC_DISPATCH - ${static_dispatch_method_body} #else static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::${operator_name}", "${overload_name}") @@ -173,7 +172,6 @@ FUNCTION_DEFINITION = CodeTemplate("""\ // ${schema_string} ${return_type} ${api_name}(${formals}) { #ifdef USE_STATIC_DISPATCH - ${static_dispatch_function_body} #else static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::${operator_name}", "${overload_name}") ``` and then we generate the old and new versions and diff them: ``` {build-old => build}/aten/src/ATen/BackendSelectRegister.cpp | 0 {build-old => build}/aten/src/ATen/CPUType.cpp | 0 {build-old => build}/aten/src/ATen/CUDAType.cpp | 0 {build-old => build}/aten/src/ATen/CUDAType.h | 0 build-old/aten/src/ATen/LegacyTHFunctionsCPU.cpp => /dev/null | 1712 ------------------- build-old/aten/src/ATen/LegacyTHFunctionsCPU.h => /dev/null | 67 - build-old/aten/src/ATen/LegacyTHFunctionsCUDA.cpp => /dev/null | 4176 --------------------------------------------- build-old/aten/src/ATen/LegacyTHFunctionsCUDA.h => /dev/null | 111 -- {build-old => build}/aten/src/ATen/MkldnnCPUType.cpp | 0 {build-old => build}/aten/src/ATen/NativeFunctions.h | 20 +- {build-old => build}/aten/src/ATen/QuantizedCPUType.cpp | 0 {build-old => build}/aten/src/ATen/QuantizedCUDAType.cpp | 0 {build-old => build}/aten/src/ATen/QuantizedCUDAType.h | 0 {build-old => build}/aten/src/ATen/SparseCPUType.cpp | 0 {build-old => build}/aten/src/ATen/SparseCUDAType.cpp | 0 {build-old => build}/aten/src/ATen/SparseCUDAType.h | 0 {build-old => build}/aten/src/ATen/TypeDefault.cpp | 0 {build-old => build}/aten/src/ATen/core/ATenOpList.cpp | 0 ``` The only diff is this: ``` diff --git a/build-old/aten/src/ATen/NativeFunctions.h b/build-new/aten/src/ATen/NativeFunctions.h index a0463dc80d..3808d27824 100644 --- a/build-old/aten/src/ATen/NativeFunctions.h +++ b/build-new/aten/src/ATen/NativeFunctions.h @@ -116,15 +116,15 @@ CAFFE2_API Tensor avg_pool1d(const Tensor & self, IntArrayRef kernel_size, IntAr CAFFE2_API Tensor adaptive_avg_pool1d(const Tensor & self, IntArrayRef output_size); CAFFE2_API std::tuple<Tensor,Tensor> adaptive_max_pool1d(const Tensor & self, IntArrayRef output_size); CAFFE2_API Tensor add(const Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor add_sparse(const Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_(Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_sparse_(Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor add_relu(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_relu_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_relu_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); @@ -639,15 +639,15 @@ CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indi CAFFE2_API std::tuple<Tensor,Tensor> mode(const Tensor & self, Dimname dim, bool keepdim=false); CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indices, const Tensor & self, Dimname dim, bool keepdim=false); CAFFE2_API Tensor mul(const Tensor & self, const Tensor & other); -CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other); CAFFE2_API Tensor mul_sparse(const Tensor & self, const Tensor & other); +CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_(Tensor & self, const Tensor & other); -CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_sparse_(Tensor & self, const Tensor & other); +CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out(Tensor & out, const Tensor & self, const Tensor & other); -CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other); +CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor mul(const Tensor & self, Scalar other); CAFFE2_API Tensor & mul_(Tensor & self, Scalar other); CAFFE2_API Tensor mv(const Tensor & self, const Tensor & vec); @@ -793,8 +793,8 @@ CAFFE2_API Tensor & silu_(Tensor & self); CAFFE2_API Tensor & silu_out(Tensor & out, const Tensor & self); CAFFE2_API Tensor silu_backward(const Tensor & grad_output, const Tensor & self); CAFFE2_API Tensor sigmoid(const Tensor & self); -CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self); CAFFE2_API Tensor sigmoid_quantized_cpu(const Tensor & self); +CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self); CAFFE2_API Tensor & sigmoid_(Tensor & self); CAFFE2_API Tensor & mkldnn_sigmoid_(Tensor & self); CAFFE2_API Tensor & sigmoid_out(Tensor & out, const Tensor & self); @@ -1008,17 +1008,17 @@ CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, bool kee CAFFE2_API Tensor nuclear_norm(const Tensor & self, IntArrayRef dim, bool keepdim=false); CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, IntArrayRef dim, bool keepdim=false); CAFFE2_API Tensor clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); +CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor mkldnn_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor quantized_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); -CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor & resize_as_(Tensor & self, const Tensor & the_template, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor & pow_out(Tensor & out, const Tensor & self, Scalar exponent); CAFFE2_API Tensor & pow_out_sparse_scalar(Tensor & out, const Tensor & self, Scalar exponent); CAFFE2_API Tensor pow(const Tensor & self, Scalar exponent); CAFFE2_API Tensor pow_sparse_scalar(const Tensor & self, Scalar exponent); CAFFE2_API Tensor & zero_(Tensor & self); -CAFFE2_API Tensor & mkldnn_zero_(Tensor & self); CAFFE2_API Tensor & zero_sparse_(Tensor & self); +CAFFE2_API Tensor & mkldnn_zero_(Tensor & self); CAFFE2_API Tensor & sub_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & sub_out_sparse(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor sub(const Tensor & self, const Tensor & other, Scalar alpha=1); @@ -1053,8 +1053,8 @@ CAFFE2_API Tensor & sparse_resize_(Tensor & self, IntArrayRef size, int64_t spar CAFFE2_API Tensor & sparse_resize_and_clear_(Tensor & self, IntArrayRef size, int64_t sparse_dim, int64_t dense_dim); CAFFE2_API Tensor sparse_mask_cpu(const Tensor & self, const Tensor & mask); CAFFE2_API Tensor sparse_mask_cuda(const Tensor & self, const Tensor & mask); -CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self); CAFFE2_API Tensor sparse_to_dense(const Tensor & self); +CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self); CAFFE2_API Tensor to_dense_backward(const Tensor & grad, const Tensor & input); CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self); CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self); ``` These are just wobbles in the order of the declarations; I couldn't be bothered to figure out exactly how the old codegen did the ordering. Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: [D23183978](https://our.internmc.facebook.com/intern/diff/D23183978) [ghstack-poisoned]
How to approach reviewing this diff: - The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen. - The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`. - All of the inputs to the old codegen are deleted. - Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI. - LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port. Signed-off-by: Edward Z. Yang <[email protected]> ghstack-source-id: 85e52d4957c407adbd2b48bfe78ad00ce152c701 Pull Request resolved: #42629
How to approach reviewing this diff: - The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen. - The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`. - All of the inputs to the old codegen are deleted. - Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI. - LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port. I kept the CUDA header in ATen/ to avoid having to fix a bunch of headers. This diff cannot be currently landed as it doesn't reimplement static dispatch. How do we know that this diff is right? We aimed for byte-for-byte modulo whitespace compatibility with the old generated code. Apply the following patch (to remove static dispatch) to the base version of PyTorch: ``` diff --git a/aten/src/ATen/function_wrapper.py b/aten/src/ATen/function_wrapper.py index e26bb3941b..334475212b 100644 --- a/aten/src/ATen/function_wrapper.py +++ b/aten/src/ATen/function_wrapper.py @@ -147,7 +147,6 @@ TENSOR_METHOD_DEFINITION = CodeTemplate("""\ // ${schema_string} ${return_type} Tensor::${api_name}(${method_formals}) const { #ifdef USE_STATIC_DISPATCH - ${static_dispatch_method_body} #else static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::${operator_name}", "${overload_name}") @@ -173,7 +172,6 @@ FUNCTION_DEFINITION = CodeTemplate("""\ // ${schema_string} ${return_type} ${api_name}(${formals}) { #ifdef USE_STATIC_DISPATCH - ${static_dispatch_function_body} #else static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::${operator_name}", "${overload_name}") ``` and then we generate the old and new versions and diff them: ``` {build-old => build}/aten/src/ATen/BackendSelectRegister.cpp | 0 {build-old => build}/aten/src/ATen/CPUType.cpp | 0 {build-old => build}/aten/src/ATen/CUDAType.cpp | 0 {build-old => build}/aten/src/ATen/CUDAType.h | 0 build-old/aten/src/ATen/LegacyTHFunctionsCPU.cpp => /dev/null | 1712 ------------------- build-old/aten/src/ATen/LegacyTHFunctionsCPU.h => /dev/null | 67 - build-old/aten/src/ATen/LegacyTHFunctionsCUDA.cpp => /dev/null | 4176 --------------------------------------------- build-old/aten/src/ATen/LegacyTHFunctionsCUDA.h => /dev/null | 111 -- {build-old => build}/aten/src/ATen/MkldnnCPUType.cpp | 0 {build-old => build}/aten/src/ATen/NativeFunctions.h | 20 +- {build-old => build}/aten/src/ATen/QuantizedCPUType.cpp | 0 {build-old => build}/aten/src/ATen/QuantizedCUDAType.cpp | 0 {build-old => build}/aten/src/ATen/QuantizedCUDAType.h | 0 {build-old => build}/aten/src/ATen/SparseCPUType.cpp | 0 {build-old => build}/aten/src/ATen/SparseCUDAType.cpp | 0 {build-old => build}/aten/src/ATen/SparseCUDAType.h | 0 {build-old => build}/aten/src/ATen/TypeDefault.cpp | 0 {build-old => build}/aten/src/ATen/core/ATenOpList.cpp | 0 ``` The only diff is this: ``` diff --git a/build-old/aten/src/ATen/NativeFunctions.h b/build-new/aten/src/ATen/NativeFunctions.h index a0463dc80d..3808d27824 100644 --- a/build-old/aten/src/ATen/NativeFunctions.h +++ b/build-new/aten/src/ATen/NativeFunctions.h @@ -116,15 +116,15 @@ CAFFE2_API Tensor avg_pool1d(const Tensor & self, IntArrayRef kernel_size, IntAr CAFFE2_API Tensor adaptive_avg_pool1d(const Tensor & self, IntArrayRef output_size); CAFFE2_API std::tuple<Tensor,Tensor> adaptive_max_pool1d(const Tensor & self, IntArrayRef output_size); CAFFE2_API Tensor add(const Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor add_sparse(const Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_(Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_sparse_(Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor add_relu(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_relu_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_relu_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); @@ -639,15 +639,15 @@ CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indi CAFFE2_API std::tuple<Tensor,Tensor> mode(const Tensor & self, Dimname dim, bool keepdim=false); CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indices, const Tensor & self, Dimname dim, bool keepdim=false); CAFFE2_API Tensor mul(const Tensor & self, const Tensor & other); -CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other); CAFFE2_API Tensor mul_sparse(const Tensor & self, const Tensor & other); +CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_(Tensor & self, const Tensor & other); -CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_sparse_(Tensor & self, const Tensor & other); +CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out(Tensor & out, const Tensor & self, const Tensor & other); -CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other); +CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor mul(const Tensor & self, Scalar other); CAFFE2_API Tensor & mul_(Tensor & self, Scalar other); CAFFE2_API Tensor mv(const Tensor & self, const Tensor & vec); @@ -793,8 +793,8 @@ CAFFE2_API Tensor & silu_(Tensor & self); CAFFE2_API Tensor & silu_out(Tensor & out, const Tensor & self); CAFFE2_API Tensor silu_backward(const Tensor & grad_output, const Tensor & self); CAFFE2_API Tensor sigmoid(const Tensor & self); -CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self); CAFFE2_API Tensor sigmoid_quantized_cpu(const Tensor & self); +CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self); CAFFE2_API Tensor & sigmoid_(Tensor & self); CAFFE2_API Tensor & mkldnn_sigmoid_(Tensor & self); CAFFE2_API Tensor & sigmoid_out(Tensor & out, const Tensor & self); @@ -1008,17 +1008,17 @@ CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, bool kee CAFFE2_API Tensor nuclear_norm(const Tensor & self, IntArrayRef dim, bool keepdim=false); CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, IntArrayRef dim, bool keepdim=false); CAFFE2_API Tensor clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); +CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor mkldnn_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor quantized_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); -CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor & resize_as_(Tensor & self, const Tensor & the_template, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor & pow_out(Tensor & out, const Tensor & self, Scalar exponent); CAFFE2_API Tensor & pow_out_sparse_scalar(Tensor & out, const Tensor & self, Scalar exponent); CAFFE2_API Tensor pow(const Tensor & self, Scalar exponent); CAFFE2_API Tensor pow_sparse_scalar(const Tensor & self, Scalar exponent); CAFFE2_API Tensor & zero_(Tensor & self); -CAFFE2_API Tensor & mkldnn_zero_(Tensor & self); CAFFE2_API Tensor & zero_sparse_(Tensor & self); +CAFFE2_API Tensor & mkldnn_zero_(Tensor & self); CAFFE2_API Tensor & sub_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & sub_out_sparse(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor sub(const Tensor & self, const Tensor & other, Scalar alpha=1); @@ -1053,8 +1053,8 @@ CAFFE2_API Tensor & sparse_resize_(Tensor & self, IntArrayRef size, int64_t spar CAFFE2_API Tensor & sparse_resize_and_clear_(Tensor & self, IntArrayRef size, int64_t sparse_dim, int64_t dense_dim); CAFFE2_API Tensor sparse_mask_cpu(const Tensor & self, const Tensor & mask); CAFFE2_API Tensor sparse_mask_cuda(const Tensor & self, const Tensor & mask); -CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self); CAFFE2_API Tensor sparse_to_dense(const Tensor & self); +CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self); CAFFE2_API Tensor to_dense_backward(const Tensor & grad, const Tensor & input); CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self); CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self); ``` These are just wobbles in the order of the declarations; I couldn't be bothered to figure out exactly how the old codegen did the ordering. Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: [D23183978](https://our.internmc.facebook.com/intern/diff/D23183978) [ghstack-poisoned]
How to approach reviewing this diff: - The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen. - The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`. - All of the inputs to the old codegen are deleted. - Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI. - LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port. Signed-off-by: Edward Z. Yang <[email protected]> ghstack-source-id: d3efb3e8bee0f07cb26d6fe2f71144d7ef488af8 Pull Request resolved: #42629
Codecov Report
@@ Coverage Diff @@
## gh/ezyang/819/base #42629 +/- ##
=====================================================
Coverage ? 69.32%
=====================================================
Files ? 378
Lines ? 46761
Branches ? 0
=====================================================
Hits ? 32417
Misses ? 14344
Partials ? 0 Continue to review full report at Codecov.
|
How to approach reviewing this diff: - The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen. - The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`. - All of the inputs to the old codegen are deleted. - Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI. - LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port. I kept the CUDA header in ATen/ to avoid having to fix a bunch of headers. This diff cannot be currently landed as it doesn't reimplement static dispatch. How do we know that this diff is right? We aimed for byte-for-byte modulo whitespace compatibility with the old generated code. Apply the following patch (to remove static dispatch) to the base version of PyTorch: ``` diff --git a/aten/src/ATen/function_wrapper.py b/aten/src/ATen/function_wrapper.py index e26bb3941b..334475212b 100644 --- a/aten/src/ATen/function_wrapper.py +++ b/aten/src/ATen/function_wrapper.py @@ -147,7 +147,6 @@ TENSOR_METHOD_DEFINITION = CodeTemplate("""\ // ${schema_string} ${return_type} Tensor::${api_name}(${method_formals}) const { #ifdef USE_STATIC_DISPATCH - ${static_dispatch_method_body} #else static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::${operator_name}", "${overload_name}") @@ -173,7 +172,6 @@ FUNCTION_DEFINITION = CodeTemplate("""\ // ${schema_string} ${return_type} ${api_name}(${formals}) { #ifdef USE_STATIC_DISPATCH - ${static_dispatch_function_body} #else static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::${operator_name}", "${overload_name}") ``` and then we generate the old and new versions and diff them: ``` {build-old => build}/aten/src/ATen/BackendSelectRegister.cpp | 0 {build-old => build}/aten/src/ATen/CPUType.cpp | 0 {build-old => build}/aten/src/ATen/CUDAType.cpp | 0 {build-old => build}/aten/src/ATen/CUDAType.h | 0 build-old/aten/src/ATen/LegacyTHFunctionsCPU.cpp => /dev/null | 1712 ------------------- build-old/aten/src/ATen/LegacyTHFunctionsCPU.h => /dev/null | 67 - build-old/aten/src/ATen/LegacyTHFunctionsCUDA.cpp => /dev/null | 4176 --------------------------------------------- build-old/aten/src/ATen/LegacyTHFunctionsCUDA.h => /dev/null | 111 -- {build-old => build}/aten/src/ATen/MkldnnCPUType.cpp | 0 {build-old => build}/aten/src/ATen/NativeFunctions.h | 20 +- {build-old => build}/aten/src/ATen/QuantizedCPUType.cpp | 0 {build-old => build}/aten/src/ATen/QuantizedCUDAType.cpp | 0 {build-old => build}/aten/src/ATen/QuantizedCUDAType.h | 0 {build-old => build}/aten/src/ATen/SparseCPUType.cpp | 0 {build-old => build}/aten/src/ATen/SparseCUDAType.cpp | 0 {build-old => build}/aten/src/ATen/SparseCUDAType.h | 0 {build-old => build}/aten/src/ATen/TypeDefault.cpp | 0 {build-old => build}/aten/src/ATen/core/ATenOpList.cpp | 0 ``` The only diff is this: ``` diff --git a/build-old/aten/src/ATen/NativeFunctions.h b/build-new/aten/src/ATen/NativeFunctions.h index a0463dc80d..3808d27824 100644 --- a/build-old/aten/src/ATen/NativeFunctions.h +++ b/build-new/aten/src/ATen/NativeFunctions.h @@ -116,15 +116,15 @@ CAFFE2_API Tensor avg_pool1d(const Tensor & self, IntArrayRef kernel_size, IntAr CAFFE2_API Tensor adaptive_avg_pool1d(const Tensor & self, IntArrayRef output_size); CAFFE2_API std::tuple<Tensor,Tensor> adaptive_max_pool1d(const Tensor & self, IntArrayRef output_size); CAFFE2_API Tensor add(const Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor add_sparse(const Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor mkldnn_add(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_(Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_sparse_(Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor & mkldnn_add_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); -CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); +CAFFE2_API Tensor & mkldnn_add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor add_relu(const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_relu_(Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & add_relu_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); @@ -639,15 +639,15 @@ CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indi CAFFE2_API std::tuple<Tensor,Tensor> mode(const Tensor & self, Dimname dim, bool keepdim=false); CAFFE2_API std::tuple<Tensor &,Tensor &> mode_out(Tensor & values, Tensor & indices, const Tensor & self, Dimname dim, bool keepdim=false); CAFFE2_API Tensor mul(const Tensor & self, const Tensor & other); -CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other); CAFFE2_API Tensor mul_sparse(const Tensor & self, const Tensor & other); +CAFFE2_API Tensor mkldnn_mul(const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_(Tensor & self, const Tensor & other); -CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_sparse_(Tensor & self, const Tensor & other); +CAFFE2_API Tensor & mkldnn_mul_(Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out(Tensor & out, const Tensor & self, const Tensor & other); -CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out_sparse_cpu(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor & mul_out_sparse_cuda(Tensor & out, const Tensor & self, const Tensor & other); +CAFFE2_API Tensor & mkldnn_mul_out(Tensor & out, const Tensor & self, const Tensor & other); CAFFE2_API Tensor mul(const Tensor & self, Scalar other); CAFFE2_API Tensor & mul_(Tensor & self, Scalar other); CAFFE2_API Tensor mv(const Tensor & self, const Tensor & vec); @@ -793,8 +793,8 @@ CAFFE2_API Tensor & silu_(Tensor & self); CAFFE2_API Tensor & silu_out(Tensor & out, const Tensor & self); CAFFE2_API Tensor silu_backward(const Tensor & grad_output, const Tensor & self); CAFFE2_API Tensor sigmoid(const Tensor & self); -CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self); CAFFE2_API Tensor sigmoid_quantized_cpu(const Tensor & self); +CAFFE2_API Tensor mkldnn_sigmoid(const Tensor & self); CAFFE2_API Tensor & sigmoid_(Tensor & self); CAFFE2_API Tensor & mkldnn_sigmoid_(Tensor & self); CAFFE2_API Tensor & sigmoid_out(Tensor & out, const Tensor & self); @@ -1008,17 +1008,17 @@ CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, bool kee CAFFE2_API Tensor nuclear_norm(const Tensor & self, IntArrayRef dim, bool keepdim=false); CAFFE2_API Tensor & nuclear_norm_out(Tensor & out, const Tensor & self, IntArrayRef dim, bool keepdim=false); CAFFE2_API Tensor clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); +CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor mkldnn_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor quantized_clone(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); -CAFFE2_API Tensor clone_sparse(const Tensor & self, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor & resize_as_(Tensor & self, const Tensor & the_template, c10::optional<MemoryFormat> memory_format=c10::nullopt); CAFFE2_API Tensor & pow_out(Tensor & out, const Tensor & self, Scalar exponent); CAFFE2_API Tensor & pow_out_sparse_scalar(Tensor & out, const Tensor & self, Scalar exponent); CAFFE2_API Tensor pow(const Tensor & self, Scalar exponent); CAFFE2_API Tensor pow_sparse_scalar(const Tensor & self, Scalar exponent); CAFFE2_API Tensor & zero_(Tensor & self); -CAFFE2_API Tensor & mkldnn_zero_(Tensor & self); CAFFE2_API Tensor & zero_sparse_(Tensor & self); +CAFFE2_API Tensor & mkldnn_zero_(Tensor & self); CAFFE2_API Tensor & sub_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor & sub_out_sparse(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1); CAFFE2_API Tensor sub(const Tensor & self, const Tensor & other, Scalar alpha=1); @@ -1053,8 +1053,8 @@ CAFFE2_API Tensor & sparse_resize_(Tensor & self, IntArrayRef size, int64_t spar CAFFE2_API Tensor & sparse_resize_and_clear_(Tensor & self, IntArrayRef size, int64_t sparse_dim, int64_t dense_dim); CAFFE2_API Tensor sparse_mask_cpu(const Tensor & self, const Tensor & mask); CAFFE2_API Tensor sparse_mask_cuda(const Tensor & self, const Tensor & mask); -CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self); CAFFE2_API Tensor sparse_to_dense(const Tensor & self); +CAFFE2_API Tensor mkldnn_to_dense(const Tensor & self); CAFFE2_API Tensor to_dense_backward(const Tensor & grad, const Tensor & input); CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self); CAFFE2_API int64_t sparse_dim_sparse(const Tensor & self); ``` These are just wobbles in the order of the declarations; I couldn't be bothered to figure out exactly how the old codegen did the ordering. Signed-off-by: Edward Z. Yang <[email protected]> Differential Revision: [D23183978](https://our.internmc.facebook.com/intern/diff/D23183978) [ghstack-poisoned]
How to approach reviewing this diff: - The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen. - The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`. - All of the inputs to the old codegen are deleted. - Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI. - LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port. Signed-off-by: Edward Z. Yang <[email protected]> ghstack-source-id: 1caf086ea28fcd2a556896fb162e35578f52c336 Pull Request resolved: #42629
Stack from ghstack:
How to approach reviewing this diff:
tools/codegen
. Start withgen.py
, then readmodel.py
and them theapi/
folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as justpython -m tools.codegen.gen
, something reasonable will happen.aten/src/ATen
was deleted except forcommon_with_cwrap.py
, which now permanently finds its home intools/shared/cwrap_common.py
(previously cmake copied the file there), andcode_template.py
, which now lives intools/codegen/code_template.py
. We remove the copying logic forcommon_with_cwrap.py
.This diff cannot be currently landed as it doesn't reimplement static dispatch.
How do we know that this diff is right? We aimed for byte-for-byte modulo whitespace compatibility with the old generated code. Apply the following patch (to remove static dispatch) to the base version of PyTorch:
and then we generate the old and new versions and diff them:
The only diff is this:
These are just wobbles in the order of the declarations; I couldn't be bothered to figure out exactly how the old codegen did the ordering.
Signed-off-by: Edward Z. Yang [email protected]
Differential Revision: D23183978