Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASoC:topology:fix the oops bug during topology free. #80

Merged
merged 2 commits into from Aug 24, 2018
Merged

ASoC:topology:fix the oops bug during topology free. #80

merged 2 commits into from Aug 24, 2018

Conversation

ghost
Copy link

@ghost ghost commented Aug 14, 2018

the topology free will cause oops:

  1. becasue the snd_soc_dapm_route{} is allocated in stack, which cause
    oops duirng topology free stage. allocate it on the heap and add
    function snd_soc_tplg_route_remove() free function.
  2. the string name is allocated in topology's short life buffer,
    which cause oops during topology free stage. re-allocate it on the
    heap can avoid the issue.

Signed-off-by: Wu Zhigang [email protected]

@ghost
Copy link
Author

ghost commented Aug 14, 2018

the snd_soc_dapm_route{} instance is allocated in stack in soc_tplg_dapm_graph_elems_load() function. it will not be existed when jump out of this function. but it is linked into the list.
when the access this instance in the free stage. it would hit panic.

@ghost ghost changed the title asoc:topology:avoid error log and oops during topology free. ASoC:topology:avoid error log and oops during topology free. Aug 14, 2018
Copy link
Member

@plbossart plbossart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ranj063 can you please comment on this.

I wonder if this is the right fix. Definitively the practice of using a stack-allocated structure in a linked list is bad, but I don't like the fact that all users of snd_soc_dapm_add_routes() now need to add a free.
It might be a better idea to copy the route structure into a heap-allocated structure in sof_route_load, and free it in sof_route_unload. In other words make the changes local to SOF and not to sound/soc/soc-topology.c

@ghost
Copy link
Author

ghost commented Aug 14, 2018

@plbossart
I know your concern.

can we put the whole part below in the sof_route_load() function?
the snd_soc_dapm_route{} instance needs the parameters parsed from tplg.
sof_route_load() needs these parameters to do more searching and configuration.
if yes, we can put the free in the sof_route_unload(), including the snd_sof_route{} and sof_ipc_pipe_comp_connect{} instance. they are allocated in the sof_route_load().

	elem = (struct snd_soc_tplg_dapm_graph_elem *)tplg->pos;
	tplg->pos += sizeof(struct snd_soc_tplg_dapm_graph_elem);

	/* validate routes */
	if (strnlen(elem->source, SNDRV_CTL_ELEM_ID_NAME_MAXLEN) ==
		SNDRV_CTL_ELEM_ID_NAME_MAXLEN)
		return -EINVAL;
	if (strnlen(elem->sink, SNDRV_CTL_ELEM_ID_NAME_MAXLEN) ==
		SNDRV_CTL_ELEM_ID_NAME_MAXLEN)
		return -EINVAL;
	if (strnlen(elem->control, SNDRV_CTL_ELEM_ID_NAME_MAXLEN) ==
		SNDRV_CTL_ELEM_ID_NAME_MAXLEN)
		return -EINVAL;

	route->source = elem->source;
	route->sink = elem->sink;
	route->connected = NULL; /* set to NULL atm for tplg users */
	if (strnlen(elem->control, SNDRV_CTL_ELEM_ID_NAME_MAXLEN) == 0)
		route->control = NULL;
	else
		route->control = elem->control;

@ghost
Copy link
Author

ghost commented Aug 14, 2018

@plbossart @lgirdwood @ranj063
there are two panic issues in these code:

  1. the variables allocated in the stack, the topology free process will hit the panic for the variables is not flushed after the stack change.
  2. the string name should be allocated again. because the memory of the name string will be freed after the topology parsing finished.

I will think about Pierre's concern. Do you have any other suggestion?

@@ -2163,6 +2163,11 @@ void snd_sof_free_topology(struct snd_sof_dev *sdev)
snd_soc_dapm_del_routes(dapm, sroute->route, 1);

/* free sroute and its private data */
kfree(sroute->route->source);
kfree(sroute->route->sink);
if (sroute->route->control)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to check if (ptr) here as kfree() does this already.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lgirdwood
ok, I will update it after back to office.
thanks

@lgirdwood
Copy link
Member

@zhigang-wu this looks fine to me and fix is at correct level. Please fix above comment.

@plbossart
Copy link
Member

@lgirdwood I am not ok with the fix, it requires the addition of a free outside of soc-topology.c, so it'd mean taking a risk of breaking other implementations.

@ghost
Copy link
Author

ghost commented Aug 15, 2018

@plbossart
Sorry, I have not got your point.
Do you means I should move the kfree() operation in the sof_route_unload() function?

@ghost
Copy link
Author

ghost commented Aug 15, 2018

copy the route structure into a heap-allocated structure in sof_route_load

I think it a good way. but if we do this, we have to send "tplg->pos" into the sof_route_load() function.
because in sof_route_load() function, we have to use the route->source and route->sink to search the widget list. then this function's API has to be changed.
is it ok?
thanks

@lgirdwood
Copy link
Member

@plbossart ah yes, I misread sof/topology.c for soc/topology.c. The free needs to be in soc/topology.c

Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record this needs to be freed in soc/topology.c NOT sof/topology.c

@@ -2163,6 +2163,10 @@ void snd_sof_free_topology(struct snd_sof_dev *sdev)
snd_soc_dapm_del_routes(dapm, sroute->route, 1);

/* free sroute and its private data */
kfree(sroute->route->source);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be done is soc/topology.c

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lgirdwood
I am sorry, you mean the "soc/soc-topology.c" ? I did not find the "soc/topology.c" file.
and adding the soc_tplg_remove_route() in the "soc/soc-topology.c" file?
do the kfree() in the soc_tplg_remove_route().
is it right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yES, add snd_soc_tplg_remove_route() to soc-topology.c

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lgirdwood
Sorry for the late response for other stuff thing.
I will update it later after finish the test.
Thanks!

@ghost
Copy link
Author

ghost commented Aug 22, 2018

@plbossart @lgirdwood
I update the PR. but I do not have the confidence with it. could you give me more suggestions?

  1. I add one function snd_soc_tplg_route_remove() in soc-topology.c to free dynamic allocated route.
  2. I am not sure this function's operation is suitable, maybe i have to utilize the soc_tplg{}->ops->dapm_route_unload() call-back function? but the input parameters is not suitable for this target.

Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit subject should include the word "fix" to indicate this fixes an existing bug.

The commit message can be change to

Topology free will cause an oops because xxx is on the stack. Fix this by allocating xx on the heap and add a proper xx remove function.

SoB.

@@ -2600,3 +2604,17 @@ int snd_soc_tplg_component_remove(struct snd_soc_component *comp, u32 index)
return !list_empty(&comp->dobj_list);
}
EXPORT_SYMBOL_GPL(snd_soc_tplg_component_remove);

int snd_soc_tplg_route_remove(struct snd_soc_dapm_route *route)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to return int here, use void instead.

@ghost
Copy link
Author

ghost commented Aug 22, 2018

@lgirdwood
OK, I will update it.

@ghost ghost changed the title ASoC:topology:avoid error log and oops during topology free. ASoC:topology:fix the oops bug during topology free. Aug 22, 2018
@ghost
Copy link
Author

ghost commented Aug 22, 2018

@lgirdwood
updated the PR's title, updated the commit subject and message already.
thanks

Copy link
Member

@plbossart plbossart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your patches are done backwards and need to be repartitioned. You first need to have a patch that only touches sound/soc/soc-topology.c, and then another one that will make use of the new function in sound/soc/sof/topology.c

I am still not clear if this solution is acceptable btw, if there are other users of the function soc_tplg_dapm_graph_elems_load() they would need to call the soc_tplg_remove_route() which doesn't seem quite right

Also fix typos such as "becasue" or "duirng". if you don't have a spell checker in your editor please configure it.

@ghost
Copy link
Author

ghost commented Aug 23, 2018

@plbossart
I will update the PR as your suggestion.
Let me explain it first:

  1. the function is called in soc_tplg_dapm_graph_elems_load().
    the call-stack is like snd_sof_load_topology() -> .... -> soc_tplg_dapm_graph_elems_load().
    in snd_sof_load_topology(), it allocates a big buffer to hold the whole topology. then do tplg parsing.
    after parsing, the big buffer will be freed when jump out of the snd_sof_load_topology().
    you can check request_firmware and release_firmware to confirm this.
    the route->sink, route->source, route->control points to the content in that buffer(old code's logical).
    In topology free stage, these info will be used. but actual content is gone with the big buffer free.
    the route->sink points to nothing. try to access it will have problem.
  2. the route is allocated in stack (the old code logical). the bad things is we call the soc_tplg_add_route() function to add this stack instance into the list. the worse thing is we will use this element in the list during the topology free stage.

Summary:
So from this description, we can see, if other users call the soc_tplg_dapm_graph_elems_load(),
he can not allocate the route in the stack. he must use the heap to malloc it. then our new function
soc_tplg_remove_route() must be called.

I also attention some code use the static table for the route. but they did not call the soc_tplg_dapm_graph_elems_load. and they did not add this route into the list.

@ghost
Copy link
Author

ghost commented Aug 23, 2018

@plbossart
I updated the PR already based on your comments.

route.connected = NULL; /* set to NULL atm for tplg users */
route = kzalloc(sizeof(*route), GFP_KERNEL);
if (!route)
return -ENOMEM;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do we unwind this if it fails. i.e say count is 10 and route 8 fails kzalloc. Where do you kfree routes 0 - 7 ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good question.
each time when the route is allocated successfully, it will be linked into the list.
in the topology free stage, the route elem will be got from the list and do the free.
from your question, I think maybe this PR is not enough.
so from your question, I propose a concern of design:
if one of the routes allocated failure, should we free all of them and block this topology parsing or just skip this one and continue for next?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we want to stop parsing on any error (whether it comes from the topology file or memory allocation). There's no point in trying to conceal and compensate for errors, it's going to lead to more issues.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should only free the routes allocated by this invocation of this function. In this case routes 0 - 7 and the strdup names for routes 0 - 7 too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. no problem

if (!route)
return -ENOMEM;

route->source = kstrdup(elem->source, GFP_KERNEL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to check return value here and deal with allocation failures.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I will do this.

@@ -2604,3 +2604,15 @@ int snd_soc_tplg_component_remove(struct snd_soc_component *comp, u32 index)
return !list_empty(&comp->dobj_list);
}
EXPORT_SYMBOL_GPL(snd_soc_tplg_component_remove);

void snd_soc_tplg_route_remove(struct snd_soc_dapm_route *route)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make sure this is called by other drivers that use topology. Probably used in the driver remove()

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lgirdwood
I have to do more debug on this to confirm this.
But actually I am not clear about your question. the topology parsing only start from the entry point: snd_soc_register_card() at the machine driver register stage.
is there any other driver will do this? the snd_soc_unregister_card()'s call-stack I will confirm whether it will call
snd_sof_free_topology() function or not.

the topology parsing's call stack is:
snd_soc_register_card()
snd_soc_instantiate_card()
soc_probe_link_components()
......
sof_pcm_probe()
snd_sof_load_topology()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm asking you to check if any one else will leak memory as a result of this update and if so, they will have call route_free() when they are removed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will check it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Zhigang,
you can simple do change as below, that is enough:

/* ASoC SOF DAPM route */
struct snd_sof_route {
struct snd_sof_dev *sdev;

  • struct snd_soc_dapm_route *route;
  • struct snd_soc_dapm_route route;
    struct list_head list; /* list in sdev route list */

    void *private;
    };

Copy link

@keyonjie keyonjie Aug 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-- struct snd_soc_dapm_route *route;

++ struct snd_soc_dapm_route route;

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keyonjie
That is the cool solution to solve this issue.
I will rewrite this PR!
Thanks, Keyon.

@ghost
Copy link
Author

ghost commented Aug 24, 2018

@plbossart @lgirdwood
the code I updated already based on your comments.
I will check the code for Liam's question next step.

@@ -1999,7 +1999,15 @@ static int sof_route_load(struct snd_soc_component *scomp, int index,
goto err;
}

sroute->route = route;
sroute->route.source = devm_kstrdup(sdev->dev,
Copy link
Member

@lgirdwood lgirdwood Aug 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

devm_ should only be used for allocating resources that will be automatically freed at remove(). This cant be used here as topology can be unloaded and then reloaded before remove(). i.e. topology is a runtime resource that will need to be manually managed.
This code still does not check the result of the allocations.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lgirdwood
Sorry for missing the check.
If we have to free it manually, it should be in the snd_sof_free_topology().
I think other project will not do it like us.
In some code, the route is defined in static table, they did not need this kind of free,
because they did not do it like us to attach the route in the list.

If we allocate the buffer in sof_route_load() and need to free manually in future.
I do not think we could use soc_tplg{}->ops->dapm_route_unload() to free,
although we allocate this in soc_tplg{}->ops->dapm_route_load() function.
because the input parameter of this function we can not used in free stage.
the soc_tplg{} instance is free already after the topology is finished parsing.

Do you have any idea about this?

1. because the snd_soc_dapm_route{} is allocated in stack, which cause
oops during topology free stage. allocated in the heap instead of
allocated in stack by changing its definition in snd_sof_route{} can
avoid oops in free stage.

2. the string name is allocated in topology's short life buffer,
which cause oops during topology free stage. allocated the string
buffer with devm_kstrdup() can avoid oops in free stage.

Signed-off-by: Wu Zhigang <[email protected]>
Suggested-by: Keyon Jie <[email protected]>
@ghost
Copy link
Author

ghost commented Aug 24, 2018

This PR miss the part of free operation in the topology free stage.
I am thinking about how to find a better way to realize it.

@ghost
Copy link
Author

ghost commented Aug 24, 2018

@lgirdwood
I add static function sof_route_remove() in the topology.c to free the allocated buffer in the route.
now we allocate and free in the same level.
I think maybe this change will not affect other project's driver.
the snd_sof_free_topology() is used by us, even other project use this,
if they did not attach the route in the route_list. it will not have problem.

One suggestion, could I revert back the definition of snd_sof_route{}?
and allocate the snd_soc_dapm_route{} in the sof_route_load() function.
and free the route in sof_route_remove() function.

add the remove function to free route in sof level.

Signed-off-by: Wu Zhigang <[email protected]>
@lgirdwood lgirdwood merged commit b7555dc into thesofproject:topic/sof-dev Aug 24, 2018
@lgirdwood
Copy link
Member

@zhigang-wu can you send new PR that fixes spcm_bind errors that don't free source or connect in sof_route_load()

@ghost
Copy link
Author

ghost commented Aug 24, 2018

@lgirdwood
I will check it and do this. thank you!

@ghost ghost mentioned this pull request Aug 28, 2018
plbossart pushed a commit that referenced this pull request Jul 17, 2020
syzkaller found its way into setsockopt with TCP_CONGESTION "cdg".
tcp_cdg_init() does a kcalloc to store the gradients. As sk_clone_lock
just copies all the memory, the allocated pointer will be copied as
well, if the app called setsockopt(..., TCP_CONGESTION) on the listener.
If now the socket will be destroyed before the congestion-control
has properly been initialized (through a call to tcp_init_transfer), we
will end up freeing memory that does not belong to that particular
socket, opening the door to a double-free:

[   11.413102] ==================================================================
[   11.414181] BUG: KASAN: double-free or invalid-free in tcp_cleanup_congestion_control+0x58/0xd0
[   11.415329]
[   11.415560] CPU: 3 PID: 4884 Comm: syz-executor.5 Not tainted 5.8.0-rc2 #80
[   11.416544] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[   11.418148] Call Trace:
[   11.418534]  <IRQ>
[   11.418834]  dump_stack+0x7d/0xb0
[   11.419297]  print_address_description.constprop.0+0x1a/0x210
[   11.422079]  kasan_report_invalid_free+0x51/0x80
[   11.423433]  __kasan_slab_free+0x15e/0x170
[   11.424761]  kfree+0x8c/0x230
[   11.425157]  tcp_cleanup_congestion_control+0x58/0xd0
[   11.425872]  tcp_v4_destroy_sock+0x57/0x5a0
[   11.426493]  inet_csk_destroy_sock+0x153/0x2c0
[   11.427093]  tcp_v4_syn_recv_sock+0xb29/0x1100
[   11.427731]  tcp_get_cookie_sock+0xc3/0x4a0
[   11.429457]  cookie_v4_check+0x13d0/0x2500
[   11.433189]  tcp_v4_do_rcv+0x60e/0x780
[   11.433727]  tcp_v4_rcv+0x2869/0x2e10
[   11.437143]  ip_protocol_deliver_rcu+0x23/0x190
[   11.437810]  ip_local_deliver+0x294/0x350
[   11.439566]  __netif_receive_skb_one_core+0x15d/0x1a0
[   11.441995]  process_backlog+0x1b1/0x6b0
[   11.443148]  net_rx_action+0x37e/0xc40
[   11.445361]  __do_softirq+0x18c/0x61a
[   11.445881]  asm_call_on_stack+0x12/0x20
[   11.446409]  </IRQ>
[   11.446716]  do_softirq_own_stack+0x34/0x40
[   11.447259]  do_softirq.part.0+0x26/0x30
[   11.447827]  __local_bh_enable_ip+0x46/0x50
[   11.448406]  ip_finish_output2+0x60f/0x1bc0
[   11.450109]  __ip_queue_xmit+0x71c/0x1b60
[   11.451861]  __tcp_transmit_skb+0x1727/0x3bb0
[   11.453789]  tcp_rcv_state_process+0x3070/0x4d3a
[   11.456810]  tcp_v4_do_rcv+0x2ad/0x780
[   11.457995]  __release_sock+0x14b/0x2c0
[   11.458529]  release_sock+0x4a/0x170
[   11.459005]  __inet_stream_connect+0x467/0xc80
[   11.461435]  inet_stream_connect+0x4e/0xa0
[   11.462043]  __sys_connect+0x204/0x270
[   11.465515]  __x64_sys_connect+0x6a/0xb0
[   11.466088]  do_syscall_64+0x3e/0x70
[   11.466617]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   11.467341] RIP: 0033:0x7f56046dc469
[   11.467844] Code: Bad RIP value.
[   11.468282] RSP: 002b:00007f5604dccdd8 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
[   11.469326] RAX: ffffffffffffffda RBX: 000000000068bf00 RCX: 00007f56046dc469
[   11.470379] RDX: 0000000000000010 RSI: 0000000020000000 RDI: 0000000000000004
[   11.471311] RBP: 00000000ffffffff R08: 0000000000000000 R09: 0000000000000000
[   11.472286] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[   11.473341] R13: 000000000041427c R14: 00007f5604dcd5c0 R15: 0000000000000003
[   11.474321]
[   11.474527] Allocated by task 4884:
[   11.475031]  save_stack+0x1b/0x40
[   11.475548]  __kasan_kmalloc.constprop.0+0xc2/0xd0
[   11.476182]  tcp_cdg_init+0xf0/0x150
[   11.476744]  tcp_init_congestion_control+0x9b/0x3a0
[   11.477435]  tcp_set_congestion_control+0x270/0x32f
[   11.478088]  do_tcp_setsockopt.isra.0+0x521/0x1a00
[   11.478744]  __sys_setsockopt+0xff/0x1e0
[   11.479259]  __x64_sys_setsockopt+0xb5/0x150
[   11.479895]  do_syscall_64+0x3e/0x70
[   11.480395]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   11.481097]
[   11.481321] Freed by task 4872:
[   11.481783]  save_stack+0x1b/0x40
[   11.482230]  __kasan_slab_free+0x12c/0x170
[   11.482839]  kfree+0x8c/0x230
[   11.483240]  tcp_cleanup_congestion_control+0x58/0xd0
[   11.483948]  tcp_v4_destroy_sock+0x57/0x5a0
[   11.484502]  inet_csk_destroy_sock+0x153/0x2c0
[   11.485144]  tcp_close+0x932/0xfe0
[   11.485642]  inet_release+0xc1/0x1c0
[   11.486131]  __sock_release+0xc0/0x270
[   11.486697]  sock_close+0xc/0x10
[   11.487145]  __fput+0x277/0x780
[   11.487632]  task_work_run+0xeb/0x180
[   11.488118]  __prepare_exit_to_usermode+0x15a/0x160
[   11.488834]  do_syscall_64+0x4a/0x70
[   11.489326]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Wei Wang fixed a part of these CDG-malloc issues with commit c120144
("tcp: memset ca_priv data to 0 properly").

This patch here fixes the listener-scenario: We make sure that listeners
setting the congestion-control through setsockopt won't initialize it
(thus CDG never allocates on listeners). For those who use AF_UNSPEC to
reuse a socket, tcp_disconnect() is changed to cleanup afterwards.

(The issue can be reproduced at least down to v4.4.x.)

Cc: Wei Wang <[email protected]>
Cc: Eric Dumazet <[email protected]>
Fixes: 2b0a8c9 ("tcp: add CDG congestion control")
Signed-off-by: Christoph Paasch <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
aiChaoSONG pushed a commit to aiChaoSONG/linux that referenced this pull request May 6, 2021
plbossart pushed a commit that referenced this pull request Jun 8, 2022
This commit adds python script to parse CoreSight tracing event and
print out source line and disassembly, it generates readable program
execution flow for easier humans inspecting.

The script receives CoreSight tracing packet with below format:

                +------------+------------+------------+
  packet(n):    |    addr    |    ip      |    cpu     |
                +------------+------------+------------+
  packet(n+1):  |    addr    |    ip      |    cpu     |
                +------------+------------+------------+

packet::addr presents the start address of the coming branch sample, and
packet::ip is the last address of the branch smple.  Therefore, a code
section between branches starts from packet(n)::addr and it stops at
packet(n+1)::ip.  As results we combines the two continuous packets to
generate the address range for instructions:

  [ sample(n)::addr .. sample(n+1)::ip ]

The script supports both objdump or llvm-objdump for disassembly with
specifying option '-d'.  If doesn't specify option '-d', the script
simply outputs source lines and symbols.

Below shows usages with llvm-objdump or objdump to output disassembly.

  # perf script -s scripts/python/arm-cs-trace-disasm.py -- -d llvm-objdump-11 -k ./vmlinux
  ARM CoreSight Trace Data Assembler Dump
  	ffff800008eb3198 <etm4_enable_hw>:
  	ffff800008eb3310: c0 38 00 35  	cbnz	w0, 0xffff800008eb3a28 <etm4_enable_hw+0x890>
  	ffff800008eb3314: 9f 3f 03 d5  	dsb	sy
  	ffff800008eb3318: df 3f 03 d5  	isb
  	ffff800008eb331c: f5 5b 42 a9  	ldp	x21, x22, [sp, #32]
  	ffff800008eb3320: fb 73 45 a9  	ldp	x27, x28, [sp, #80]
  	ffff800008eb3324: e0 82 40 39  	ldrb	w0, [x23, #32]
  	ffff800008eb3328: 60 00 00 34  	cbz	w0, 0xffff800008eb3334 <etm4_enable_hw+0x19c>
  	ffff800008eb332c: e0 03 19 aa  	mov	x0, x25
  	ffff800008eb3330: 8c fe ff 97  	bl	0xffff800008eb2d60 <etm4_cs_lock.isra.0.part.0>
              main  6728/6728  [0004]         0.000000000  etm4_enable_hw+0x198                    [kernel.kallsyms]
  	ffff800008eb2d60 <etm4_cs_lock.isra.0.part.0>:
  	ffff800008eb2d60: 1f 20 03 d5  	nop
  	ffff800008eb2d64: 1f 20 03 d5  	nop
  	ffff800008eb2d68: 3f 23 03 d5  	hint	#25
  	ffff800008eb2d6c: 00 00 40 f9  	ldr	x0, [x0]
  	ffff800008eb2d70: 9f 3f 03 d5  	dsb	sy
  	ffff800008eb2d74: 00 c0 3e 91  	add	x0, x0, #4016
  	ffff800008eb2d78: 1f 00 00 b9  	str	wzr, [x0]
  	ffff800008eb2d7c: bf 23 03 d5  	hint	#29
  	ffff800008eb2d80: c0 03 5f d6  	ret
              main  6728/6728  [0004]         0.000000000  etm4_cs_lock.isra.0.part.0+0x20

  # perf script -s scripts/python/arm-cs-trace-disasm.py -- -d objdump -k ./vmlinux
  ARM CoreSight Trace Data Assembler Dump
  	ffff800008eb3310 <etm4_enable_hw+0x178>:
  	ffff800008eb3310:	350038c0 	cbnz	w0, ffff800008eb3a28 <etm4_enable_hw+0x890>
  	ffff800008eb3314:	d5033f9f 	dsb	sy
  	ffff800008eb3318:	d5033fdf 	isb
  	ffff800008eb331c:	a9425bf5 	ldp	x21, x22, [sp, #32]
  	ffff800008eb3320:	a94573fb 	ldp	x27, x28, [sp, #80]
  	ffff800008eb3324:	394082e0 	ldrb	w0, [x23, #32]
  	ffff800008eb3328:	34000060 	cbz	w0, ffff800008eb3334 <etm4_enable_hw+0x19c>
  	ffff800008eb332c:	aa1903e0 	mov	x0, x25
  	ffff800008eb3330:	97fffe8c 	bl	ffff800008eb2d60 <etm4_cs_lock.isra.0.part.0>
              main  6728/6728  [0004]         0.000000000  etm4_enable_hw+0x198                    [kernel.kallsyms]
  	ffff800008eb2d60 <etm4_cs_lock.isra.0.part.0>:
  	ffff800008eb2d60:	d503201f 	nop
  	ffff800008eb2d64:	d503201f 	nop
  	ffff800008eb2d68:	d503233f 	paciasp
  	ffff800008eb2d6c:	f9400000 	ldr	x0, [x0]
  	ffff800008eb2d70:	d5033f9f 	dsb	sy
  	ffff800008eb2d74:	913ec000 	add	x0, x0, #0xfb0
  	ffff800008eb2d78:	b900001f 	str	wzr, [x0]
  	ffff800008eb2d7c:	d50323bf 	autiasp
  	ffff800008eb2d80:	d65f03c0 	ret
              main  6728/6728  [0004]         0.000000000  etm4_cs_lock.isra.0.part.0+0x20

Signed-off-by: Leo Yan <[email protected]>
Co-authored-by: Al Grant <[email protected]>
Co-authored-by: Mathieu Poirier <[email protected]>
Co-authored-by: Tor Jeremiassen <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Eelco Chaudron <[email protected]>
Cc: German Gomez <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephen Brennan <[email protected]>
Cc: Tanmay Jagdale <[email protected]>
Cc: [email protected]
Cc: zengshun . wu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
reddysujith added a commit to reddysujith/linux that referenced this pull request Jan 23, 2023
As interrupts are Level-triggered,until unless we deassert the register
the interrupts are generated which causes spurious interrupts unhandled.

Now we deasserted the interrupt at top half which solved the below
warning nobody cared.

warning reported in dmesg:
	irq 80: nobody cared (try booting with the "irqpoll" option)
	CPU: 5 PID: 2735 Comm: irq/80-AudioDSP
		Not tainted 5.15.86-15817-g4c19f3e06d49 thesofproject#1 1bd3fd932cf58caacc95b0504d6ea1e3eab22289
	Hardware name: Google Skyrim/Skyrim, BIOS Google_Skyrim.15303.0.0 01/03/2023
	Call Trace:
	<IRQ>
	dump_stack_lvl+0x69/0x97
	 __report_bad_irq+0x3a/0xae
	note_interrupt+0x1a9/0x1e3
	handle_irq_event_percpu+0x4b/0x6e
	handle_irq_event+0x36/0x5b
	handle_fasteoi_irq+0xae/0x171
	 __common_interrupt+0x48/0xc4
	</IRQ>

	handlers:
	acp_irq_handler [snd_sof_amd_acp] threaded [<000000007e089f34>] acp_irq_thread [snd_sof_amd_acp]
	Disabling IRQ thesofproject#80

Signed-off-by: V sujith kumar Reddy <[email protected]>
reddysujith added a commit to reddysujith/linux that referenced this pull request Jan 24, 2023
As interrupts are Level-triggered,unless and until we deassert the register
the interrupts are generated which causes spurious interrupts unhandled.

Now we deasserted the interrupt at top half which solved the below
"nobody cared" warning.

warning reported in dmesg:
	irq 80: nobody cared (try booting with the "irqpoll" option)
	CPU: 5 PID: 2735 Comm: irq/80-AudioDSP
		Not tainted 5.15.86-15817-g4c19f3e06d49 thesofproject#1 1bd3fd932cf58caacc95b0504d6ea1e3eab22289
	Hardware name: Google Skyrim/Skyrim, BIOS Google_Skyrim.15303.0.0 01/03/2023
	Call Trace:
	<IRQ>
	dump_stack_lvl+0x69/0x97
	 __report_bad_irq+0x3a/0xae
	note_interrupt+0x1a9/0x1e3
	handle_irq_event_percpu+0x4b/0x6e
	handle_irq_event+0x36/0x5b
	handle_fasteoi_irq+0xae/0x171
	 __common_interrupt+0x48/0xc4
	</IRQ>

	handlers:
	acp_irq_handler [snd_sof_amd_acp] threaded [<000000007e089f34>] acp_irq_thread [snd_sof_amd_acp]
	Disabling IRQ thesofproject#80

Signed-off-by: V sujith kumar Reddy <[email protected]>
reddysujith added a commit to reddysujith/linux that referenced this pull request Jan 25, 2023
As interrupts are Level-triggered,unless and until we deassert the register
the interrupts are generated which causes spurious interrupts unhandled.

Now we deasserted the interrupt at top half which solved the below
"nobody cared" warning.

warning reported in dmesg:
	irq 80: nobody cared (try booting with the "irqpoll" option)
	CPU: 5 PID: 2735 Comm: irq/80-AudioDSP
		Not tainted 5.15.86-15817-g4c19f3e06d49 thesofproject#1 1bd3fd932cf58caacc95b0504d6ea1e3eab22289
	Hardware name: Google Skyrim/Skyrim, BIOS Google_Skyrim.15303.0.0 01/03/2023
	Call Trace:
	<IRQ>
	dump_stack_lvl+0x69/0x97
	 __report_bad_irq+0x3a/0xae
	note_interrupt+0x1a9/0x1e3
	handle_irq_event_percpu+0x4b/0x6e
	handle_irq_event+0x36/0x5b
	handle_fasteoi_irq+0xae/0x171
	 __common_interrupt+0x48/0xc4
	</IRQ>

	handlers:
	acp_irq_handler [snd_sof_amd_acp] threaded [<000000007e089f34>] acp_irq_thread [snd_sof_amd_acp]
	Disabling IRQ thesofproject#80

Signed-off-by: V sujith kumar Reddy <[email protected]>
reddysujith added a commit to reddysujith/linux that referenced this pull request Jan 26, 2023
As interrupts are Level-triggered,unless and until we deassert the register
the interrupts are generated which causes spurious interrupts unhandled.

Now we deasserted the interrupt at top half which solved the below
"nobody cared" warning.

warning reported in dmesg:
	irq 80: nobody cared (try booting with the "irqpoll" option)
	CPU: 5 PID: 2735 Comm: irq/80-AudioDSP
		Not tainted 5.15.86-15817-g4c19f3e06d49 thesofproject#1 1bd3fd932cf58caacc95b0504d6ea1e3eab22289
	Hardware name: Google Skyrim/Skyrim, BIOS Google_Skyrim.15303.0.0 01/03/2023
	Call Trace:
	<IRQ>
	dump_stack_lvl+0x69/0x97
	 __report_bad_irq+0x3a/0xae
	note_interrupt+0x1a9/0x1e3
	handle_irq_event_percpu+0x4b/0x6e
	handle_irq_event+0x36/0x5b
	handle_fasteoi_irq+0xae/0x171
	 __common_interrupt+0x48/0xc4
	</IRQ>

	handlers:
	acp_irq_handler [snd_sof_amd_acp] threaded [<000000007e089f34>] acp_irq_thread [snd_sof_amd_acp]
	Disabling IRQ thesofproject#80

Signed-off-by: V sujith kumar Reddy <[email protected]>
ranj063 pushed a commit that referenced this pull request Feb 2, 2023
As interrupts are Level-triggered,unless and until we deassert the register
the interrupts are generated which causes spurious interrupts unhandled.

Now we deasserted the interrupt at top half which solved the below
"nobody cared" warning.

warning reported in dmesg:
	irq 80: nobody cared (try booting with the "irqpoll" option)
	CPU: 5 PID: 2735 Comm: irq/80-AudioDSP
		Not tainted 5.15.86-15817-g4c19f3e06d49 #1 1bd3fd932cf58caacc95b0504d6ea1e3eab22289
	Hardware name: Google Skyrim/Skyrim, BIOS Google_Skyrim.15303.0.0 01/03/2023
	Call Trace:
	<IRQ>
	dump_stack_lvl+0x69/0x97
	 __report_bad_irq+0x3a/0xae
	note_interrupt+0x1a9/0x1e3
	handle_irq_event_percpu+0x4b/0x6e
	handle_irq_event+0x36/0x5b
	handle_fasteoi_irq+0xae/0x171
	 __common_interrupt+0x48/0xc4
	</IRQ>

	handlers:
	acp_irq_handler [snd_sof_amd_acp] threaded [<000000007e089f34>] acp_irq_thread [snd_sof_amd_acp]
	Disabling IRQ #80

Signed-off-by: V sujith kumar Reddy <[email protected]>
ujfalusi pushed a commit to ujfalusi/sof-linux that referenced this pull request Feb 8, 2023
As interrupts are Level-triggered,unless and until we deassert the register
the interrupts are generated which causes spurious interrupts unhandled.

Now we deasserted the interrupt at top half which solved the below
"nobody cared" warning.

warning reported in dmesg:
	irq 80: nobody cared (try booting with the "irqpoll" option)
	CPU: 5 PID: 2735 Comm: irq/80-AudioDSP
		Not tainted 5.15.86-15817-g4c19f3e06d49 thesofproject#1 1bd3fd932cf58caacc95b0504d6ea1e3eab22289
	Hardware name: Google Skyrim/Skyrim, BIOS Google_Skyrim.15303.0.0 01/03/2023
	Call Trace:
	<IRQ>
	dump_stack_lvl+0x69/0x97
	 __report_bad_irq+0x3a/0xae
	note_interrupt+0x1a9/0x1e3
	handle_irq_event_percpu+0x4b/0x6e
	handle_irq_event+0x36/0x5b
	handle_fasteoi_irq+0xae/0x171
	 __common_interrupt+0x48/0xc4
	</IRQ>

	handlers:
	acp_irq_handler [snd_sof_amd_acp] threaded [<000000007e089f34>] acp_irq_thread [snd_sof_amd_acp]
	Disabling IRQ thesofproject#80

Signed-off-by: V sujith kumar Reddy <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>
ujfalusi pushed a commit that referenced this pull request Feb 21, 2023
As interrupts are Level-triggered,unless and until we deassert the register
the interrupts are generated which causes spurious interrupts unhandled.

Now we deasserted the interrupt at top half which solved the below
"nobody cared" warning.

warning reported in dmesg:
	irq 80: nobody cared (try booting with the "irqpoll" option)
	CPU: 5 PID: 2735 Comm: irq/80-AudioDSP
		Not tainted 5.15.86-15817-g4c19f3e06d49 #1 1bd3fd932cf58caacc95b0504d6ea1e3eab22289
	Hardware name: Google Skyrim/Skyrim, BIOS Google_Skyrim.15303.0.0 01/03/2023
	Call Trace:
	<IRQ>
	dump_stack_lvl+0x69/0x97
	 __report_bad_irq+0x3a/0xae
	note_interrupt+0x1a9/0x1e3
	handle_irq_event_percpu+0x4b/0x6e
	handle_irq_event+0x36/0x5b
	handle_fasteoi_irq+0xae/0x171
	 __common_interrupt+0x48/0xc4
	</IRQ>

	handlers:
	acp_irq_handler [snd_sof_amd_acp] threaded [<000000007e089f34>] acp_irq_thread [snd_sof_amd_acp]
	Disabling IRQ #80

Signed-off-by: V sujith kumar Reddy <[email protected]>
Reviewed-by: Pierre-Louis Bossart <[email protected]>
Reviewed-by: Ranjani Sridharan <[email protected]>
bardliao pushed a commit to bardliao/linux that referenced this pull request Oct 16, 2024
During the migration of Soundwire runtime stream allocation from
the Qualcomm Soundwire controller to SoC's soundcard drivers the sdm845
soundcard was forgotten.

At this point any playback attempt or audio daemon startup, for instance
on sdm845-db845c (Qualcomm RB3 board), will result in stream pointer
NULL dereference:

 Unable to handle kernel NULL pointer dereference at virtual
 address 0000000000000020
 Mem abort info:
   ESR = 0x0000000096000004
   EC = 0x25: DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
   FSC = 0x04: level 0 translation fault
 Data abort info:
   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
 user pgtable: 4k pages, 48-bit VAs, pgdp=0000000101ecf000
 [0000000000000020] pgd=0000000000000000, p4d=0000000000000000
 Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
 Modules linked in: ...
 CPU: 5 UID: 0 PID: 1198 Comm: aplay
 Not tainted 6.12.0-rc2-qcomlt-arm64-00059-g9d78f315a362-dirty thesofproject#18
 Hardware name: Thundercomm Dragonboard 845c (DT)
 pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 pc : sdw_stream_add_slave+0x44/0x380 [soundwire_bus]
 lr : sdw_stream_add_slave+0x44/0x380 [soundwire_bus]
 sp : ffff80008a2035c0
 x29: ffff80008a2035c0 x28: ffff80008a203978 x27: 0000000000000000
 x26: 00000000000000c0 x25: 0000000000000000 x24: ffff1676025f4800
 x23: ffff167600ff1cb8 x22: ffff167600ff1c98 x21: 0000000000000003
 x20: ffff167607316000 x19: ffff167604e64e80 x18: 0000000000000000
 x17: 0000000000000000 x16: ffffcec265074160 x15: 0000000000000000
 x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
 x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
 x8 : 0000000000000000 x7 : 0000000000000000 x6 : ffff167600ff1cec
 x5 : ffffcec22cfa2010 x4 : 0000000000000000 x3 : 0000000000000003
 x2 : ffff167613f836c0 x1 : 0000000000000000 x0 : ffff16761feb60b8
 Call trace:
  sdw_stream_add_slave+0x44/0x380 [soundwire_bus]
  wsa881x_hw_params+0x68/0x80 [snd_soc_wsa881x]
  snd_soc_dai_hw_params+0x3c/0xa4
  __soc_pcm_hw_params+0x230/0x660
  dpcm_be_dai_hw_params+0x1d0/0x3f8
  dpcm_fe_dai_hw_params+0x98/0x268
  snd_pcm_hw_params+0x124/0x460
  snd_pcm_common_ioctl+0x998/0x16e8
  snd_pcm_ioctl+0x34/0x58
  __arm64_sys_ioctl+0xac/0xf8
  invoke_syscall+0x48/0x104
  el0_svc_common.constprop.0+0x40/0xe0
  do_el0_svc+0x1c/0x28
  el0_svc+0x34/0xe0
  el0t_64_sync_handler+0x120/0x12c
  el0t_64_sync+0x190/0x194
 Code: aa0403fb f9418400 9100e000 9400102f (f8420f22)
 ---[ end trace 0000000000000000 ]---

0000000000006108 <sdw_stream_add_slave>:
    6108:       d503233f        paciasp
    610c:       a9b97bfd        stp     x29, x30, [sp, #-112]!
    6110:       910003fd        mov     x29, sp
    6114:       a90153f3        stp     x19, x20, [sp, thesofproject#16]
    6118:       a9025bf5        stp     x21, x22, [sp, thesofproject#32]
    611c:       aa0103f6        mov     x22, x1
    6120:       2a0303f5        mov     w21, w3
    6124:       a90363f7        stp     x23, x24, [sp, thesofproject#48]
    6128:       aa0003f8        mov     x24, x0
    612c:       aa0203f7        mov     x23, x2
    6130:       a9046bf9        stp     x25, x26, [sp, thesofproject#64]
    6134:       aa0403f9        mov     x25, x4        <-- x4 copied to x25
    6138:       a90573fb        stp     x27, x28, [sp, thesofproject#80]
    613c:       aa0403fb        mov     x27, x4
    6140:       f9418400        ldr     x0, [x0, thesofproject#776]
    6144:       9100e000        add     x0, x0, #0x38
    6148:       94000000        bl      0 <mutex_lock>
    614c:       f8420f22        ldr     x2, [x25, thesofproject#32]!  <-- offset 0x44
    ^^^
This is 0x6108 + offset 0x44 from the beginning of sdw_stream_add_slave()
where data abort happens.
wsa881x_hw_params() is called with stream = NULL and passes it further
in register x4 (5th argument) to sdw_stream_add_slave() without any checks.
Value from x4 is copied to x25 and finally it aborts on trying to load
a value from address in x25 plus offset 32 (in dec) which corresponds
to master_list member in struct sdw_stream_runtime:

struct sdw_stream_runtime {
        const char  *              name;	/*     0     8 */
        struct sdw_stream_params   params;	/*     8    12 */
        enum sdw_stream_state      state;	/*    20     4 */
        enum sdw_stream_type       type;	/*    24     4 */
        /* XXX 4 bytes hole, try to pack */
 here-> struct list_head           master_list;	/*    32    16 */
        int                        m_rt_count;	/*    48     4 */
        /* size: 56, cachelines: 1, members: 6 */
        /* sum members: 48, holes: 1, sum holes: 4 */
        /* padding: 4 */
        /* last cacheline: 56 bytes */

Fix this by adding required calls to qcom_snd_sdw_startup() and
sdw_release_stream() to startup and shutdown routines which restores
the previous correct behaviour when ->set_stream() method is called to
set a valid stream runtime pointer on playback startup.

Reproduced and then fix was tested on db845c RB3 board.

Reported-by: Dmitry Baryshkov <[email protected]>
Cc: [email protected]
Fixes: 15c7fab ("ASoC: qcom: Move Soundwire runtime stream alloc to soundcards")
Cc: Srinivas Kandagatla <[email protected]>
Cc: Dmitry Baryshkov <[email protected]>
Cc: Krzysztof Kozlowski <[email protected]>
Cc: Pierre-Louis Bossart <[email protected]>
Signed-off-by: Alexey Klimov <[email protected]>
Tested-by: Steev Klimaszewski <[email protected]> # Lenovo Yoga C630
Reviewed-by: Krzysztof Kozlowski <[email protected]>
Reviewed-by: Srinivas Kandagatla <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Mark Brown <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants