Migrate to config for Int8DynamicActivationIntxWeightConfig #1836

metascroy · 2025-03-05T01:21:38Z

This PR:

Migrates to Int8DynamicActivationIntxWeightConfig
Merges PackedLinearInt8DynamicActivationIntxWeightLayout to use the same quantizer, and merges the tests

pytorch-bot · 2025-03-05T01:21:42Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1836

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 11 Pending

As of commit 3f51d82 with merge base ada4c02 ():

NEW FAILURE - The following job has failed:

Code Analysis with Ruff / build (3.9) (gh)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

metascroy · 2025-03-05T01:27:38Z

torchao/dtypes/affine_quantized_tensor.py

@drisspg @jerryzh168 are we ok adding tensor_impl_ctr_kwargs to from_hp_to_intx.

It can be used to propagate a bias when constructing the weight tensor subclass via from_plain.

not super familiar with this code, but as long as this doesn't change the BC surface sgtm

if this is controversial, can we separate this from the config migration? I'd love to see that piece land asap.

The new tensor_impl_ctr_kwargs has a default value of None, so it shouldn't change how any existing call sites work. The CI also passes.

sgtm, let's land if no other concerns?

Ok I can land after CI passes.

I don't have concerns on the changes in torchao/experimental/*. I mostly wanted feedback from someone in torchao on this change from torchao/dtypes/affine_quantized_tensor.py

I mean,

tensor_impl_ctr_kwargs: Optional[dict] = None,

is pretty hard to follow. IMO it would be better to refactor the code to just pass bias directly instead of adding a layer of indirection.

However, this is nitty, not a part of BC, and I want to see the config part land, so how about we chat about ^ in parallel and if it needs fixing someone can do that in a future PR?

The reason for using tensor_impl_ctr_kwargs instead of "bias" is I thought it would be more extensible in future.
Currently if you use to_affine_quantized_intx to do quantization, there is no way to forward other args to your tensor subclass's "from_layout(data, scale, zero_point, _layout)" method. Here we want to forward bias, but in future someone might want to forward something else.

What is currently done is the code in torchao/experimental has its own copy of to_affine_quantized_intx that supports bias. The downside of this is I fear the two might drift apart going forward.

With all that said, I can refactor this PR to only contain the config change and put up the tensor_impl_ctr_kwargs change in another PR.

I don't get it, why can't we just add bias as an argument and pass it?

Let's move the discussion to a future PR. I split out the change.

Jack-Khuu

Mostly nits

Not terribly familiar with this code, but passes the gut test

Jack-Khuu · 2025-03-05T23:53:54Z

torchao/dtypes/affine_quantized_tensor.py

+        if tensor_impl_ctr_kwargs is None:
+            tensor_impl_ctr_kwargs = {}
+        tensor_impl = tensor_impl_ctr(
+            data, scale, zero_point, _layout, **tensor_impl_ctr_kwargs
+        )


Don't know which style AO uses, no strong pref

Suggested change

if tensor_impl_ctr_kwargs is None:

tensor_impl_ctr_kwargs = {}

tensor_impl = tensor_impl_ctr(

data, scale, zero_point, _layout, **tensor_impl_ctr_kwargs

)

tensor_impl = tensor_impl_ctr(

data, scale, zero_point, _layout, **(tensor_impl_ctr_kwargs or {})

)

I'd like to hear from @drisspg or someone from torchao on this change.

Not so much on the style preference, but more so on whether they're OK adding tensor_impl_ctr_kwargs to the to_affine_quantized_intx signature.

Jack-Khuu · 2025-03-06T00:26:47Z

torchao/experimental/tests/test_int8_dynamic_activation_intx_weight.py

+        quantized_model_reference = copy.deepcopy(model)
+        quantize_(
+            quantized_model_reference,
+            int8_dynamic_activation_intx_weight(
+                weight_dtype=weight_dtype,
+                granularity=granularity,
+                has_weight_zeros=has_weight_zeros,
+                layout=reference_layout,
+            ),
+        )
+
+        with torch.no_grad():
+            result = quantized_model(activations)
+            expected_result = quantized_model_reference(activations)


nit: We can factor out the creation of expected_results since it's just PlainLayout in both cases (different models)

Jack-Khuu · 2025-03-06T00:33:21Z

torchao/experimental/quant_api.py

-            and layout.target == Target.ATEN
-        )
+    weight_dtype: torch.dtype = torch.int4
+    granularity: Union[PerRow, PerGroup] = PerRow()


Why not
granularity: Union[PerRow, PerGroup] = PerGroup(128),

like int8_dynamic_activation_intx_weight?

PerRow is safer default because it doesn't depend on input data size. I expect users should always specify this parameter

Jack-Khuu · 2025-03-06T00:33:40Z

torchao/experimental/quant_api.py

-            )
+
+@register_quantize_module_handler(Int8DynamicActivationIntxWeightConfig)
+def _int8_dynamic_activation_intx_weigh_transform(


Suggested change

def _int8_dynamic_activation_intx_weigh_transform(

def _int8_dynamic_activation_intx_weight_transform(

Jack-Khuu · 2025-03-06T00:34:32Z

torchao/experimental/quant_api.py

+    tensor_impl_ctr_kwargs = None
+    if isinstance(layout, PackedLinearInt8DynamicActivationIntxWeightLayout):
+        # We need to create a new layout object for each module because when
+        # granulairty is PerRow, the layout objects cannot share the group_size


Suggested change

# granulairty is PerRow, the layout objects cannot share the group_size

# granularity is PerRow, the layout objects cannot share the group_size

Jack-Khuu · 2025-03-06T00:44:00Z

torchao/experimental/packed_linear_int8_dynamic_activation_intx_weight_layout.py

+    if weight_tensor.tensor_impl.get_layout().has_bias:
+        assert (
+            bias is None
+        ), "bias should be None because it is already packed with the weights (has_bias=True)"


nit: if: assert; also fine with leaving it as-is for legibility

Suggested change

if weight_tensor.tensor_impl.get_layout().has_bias:

assert (

bias is None

), "bias should be None because it is already packed with the weights (has_bias=True)"

assert (

not weight_tensor.tensor_impl.get_layout().has_bias or bias is None

), "bias should be None because it is already packed with the weights (has_bias=True)"

Jack-Khuu · 2025-03-06T00:50:28Z

torchao/experimental/quant_api.py

-                if torch.backends.kleidiai.is_available():
-                    if isinstance(granularity, PerGroup):
-                        scale_dtype = (
-                            torch.bfloat16
-                        )  # KleidiAI kernel requires bfloat16 scale_dtype


Seems like we always use float32 in to_affine_quantized_intx. Is this intentional?

KleidiAI tests pass with this. This was only used for python-based quantization that computes qvals, scales, zeros, not by what was passed to the kernel itself.

Aten KleidiAI groupwise kernel requires scale_dtype as torch.bfloat16 otherwise it would fallback to ref implementation. Also the input to aten kernel needs to be float32

I will update this then.

It is only for groupwise that they'll fallback to the reference kernel? For channelwise, FP32 is fine or should it still be bfloat16?

bfloat16 only for groupwise. float32 for channelwise

vkuzo

didn't read the code in detail, but it would be great to migrate this to config soon so we can disable the old path

please feel free to wait for a proper review if needed

torchao/experimental/packed_linear_int8_dynamic_activation_intx_weight_layout.py

metascroy requested a review from digantdesai March 5, 2025 01:21

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 5, 2025

metascroy commented Mar 5, 2025

View reviewed changes

metascroy force-pushed the kleidi-ai-tests branch from e332b54 to 4b3a742 Compare March 5, 2025 05:15

drisspg added the topic: improvement Use this tag if this PR is an improvement (doesn't fit into any of the other categories) label Mar 5, 2025

metascroy added 4 commits March 5, 2025 09:27

init

94370d3

up

f098d87

up

5387f76

up

fc46e34

metascroy force-pushed the kleidi-ai-tests branch from f138c3d to fc46e34 Compare March 5, 2025 17:27

metascroy mentioned this pull request Mar 5, 2025

Migrate to int args #1846

Merged

metascroy requested a review from Jack-Khuu March 5, 2025 23:34

metascroy mentioned this pull request Mar 5, 2025

Update ARM CPU experimental kernels from AO to leverage pip install pytorch/torchchat#1458

Merged

Jack-Khuu approved these changes Mar 6, 2025

View reviewed changes

up

81effd8

metascroy requested a review from andrewor14 March 6, 2025 20:04

vkuzo mentioned this pull request Mar 7, 2025

migration of quantize_ workflow configuration from callables to configs #1690

Open

vkuzo approved these changes Mar 7, 2025

View reviewed changes

drisspg reviewed Mar 7, 2025

View reviewed changes

torchao/experimental/packed_linear_int8_dynamic_activation_intx_weight_layout.py Show resolved Hide resolved

metascroy added 4 commits March 7, 2025 09:41

up

139923a

lint

eb55569

up

73551f4

lint

3f51d82

metascroy merged commit 25377e0 into main Mar 7, 2025
16 of 18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate to config for Int8DynamicActivationIntxWeightConfig #1836

Migrate to config for Int8DynamicActivationIntxWeightConfig #1836

metascroy commented Mar 5, 2025

pytorch-bot bot commented Mar 5, 2025 •

edited

Loading

metascroy Mar 5, 2025

vkuzo Mar 7, 2025

metascroy Mar 7, 2025

vkuzo Mar 7, 2025

metascroy Mar 7, 2025

vkuzo Mar 7, 2025

metascroy Mar 7, 2025

vkuzo Mar 7, 2025

metascroy Mar 7, 2025

Jack-Khuu left a comment

Jack-Khuu Mar 5, 2025

metascroy Mar 6, 2025

Jack-Khuu Mar 6, 2025

Jack-Khuu Mar 6, 2025

metascroy Mar 6, 2025

Jack-Khuu Mar 6, 2025

Jack-Khuu Mar 6, 2025

Jack-Khuu Mar 6, 2025

Jack-Khuu Mar 6, 2025

metascroy Mar 6, 2025

nikhil-arm Mar 6, 2025 •

edited

Loading

metascroy Mar 6, 2025

nikhil-arm Mar 10, 2025

vkuzo left a comment

	def _int8_dynamic_activation_intx_weigh_transform(
	def _int8_dynamic_activation_intx_weight_transform(

	# granulairty is PerRow, the layout objects cannot share the group_size
	# granularity is PerRow, the layout objects cannot share the group_size

Migrate to config for Int8DynamicActivationIntxWeightConfig #1836

Migrate to config for Int8DynamicActivationIntxWeightConfig #1836

Conversation

metascroy commented Mar 5, 2025

pytorch-bot bot commented Mar 5, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1836

❌ 1 New Failure, 11 Pending

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Jack-Khuu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nikhil-arm Mar 6, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vkuzo left a comment

Choose a reason for hiding this comment

pytorch-bot bot commented Mar 5, 2025 •

edited

Loading

nikhil-arm Mar 6, 2025 •

edited

Loading