Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

this is abouth 1212m emu patch #1

Closed
wants to merge 1 commit into from
Closed

this is abouth 1212m emu patch #1

wants to merge 1 commit into from

Conversation

oeai
Copy link

@oeai oeai commented Jan 5, 2013

one of the biggest problems is that the firmware was not loaded
i've moved emu1010_init out of function - load_patches so it was clear
and now it's 1275 line that was missed before, and some other changes i'll probably will commentout if needed

one of the biggest problems is that the firmware was not loaded
i've moved emu1010_init out of function - load_patches so it was clear
and now it's 1275 line that was missed before, and some other changes i'll probably will commentout if needed
@tiwai
Copy link
Owner

tiwai commented Jan 8, 2013

At Sat, 05 Jan 2013 11:42:29 -0800,
Ra wrote:

one of the biggest problems is that the firmware was not loaded
i've moved emu1010_init out of function - load_patches so it was clear
and now it's 1275 line that was missed before, and some other changes i'll probably will commentout if needed

I don't accept any pull requests from github. It's provided just as a
mirror of kernel.org repo.

Please follow the standard procedure of patch submission described in
Documentation/SubmittingPatches.

BTW, about your changes:
through a quick glance, it seems that it can break other
emu10k1/emu20k1 devices. Please fix the patch so that you are 100%
sure that it doesn't break any other devices.

Also, follow the coding style described in the document above.
scripts/checkpatch.pl is your friend.

For further questions / comments, please post to alsa-devel ML.

thanks,

Takashi

You can merge this Pull Request by running:

git pull https://github.com/oeai/sound patch-1

Or you can view, comment on it, or merge it online at:

#1

-- Commit Summary --

  • this is abouth 1212m emu patch

-- File Changes --

M sound/pci/emu10k1/emu10k1_main.c (322)

-- Patch Links --

https://github.com/tiwai/sound/pull/1.patch
https://github.com/tiwai/sound/pull/1.diff

@oeai
Copy link
Author

oeai commented Jan 8, 2013

Sorry, i don't have much time for programming right now, i did what i
can - i'm not a C++ programmer, for me it was not working with latest
snapshot and i've changed it as i can.
and for the pull request - it's just simplier for you to work with it i
thought
and i just made a fork for emu1212m users and pointed to you that you
got an error for just emu1212 - it is not loading firmware really and i
described where's the catch out.
and this is made for those who works on sources and knows what to do,
maybe i'll get back to it in a later time, but i'm doing other projects
as well finding more and more bugs, but actually i need to work on some
really other projects instead but it is all bugged...
and i was posting in ML really but some messages just didn't appear and
this patch i've posted too.
i've created a separated project for this, so i can comment it and work on it
easier, maybe someone will join me and help to change it more accurately
https://github.com/oeai/emu1212m-alsa-driver
i will comment some changes i've made so you can see the difference
that is not so hard really
gl

btw right now
you are missing this code in emu10k1_main.c torvalds#862 string

err = snd_emu1010_load_firmware(emu);

if (err != 0) {

snd_printk(

 KERN_INFO "emu1010: Loading Firmware file %s failed\n",

 filename);

return err;

}

@oeai oeai closed this Jan 8, 2013
tiwai pushed a commit that referenced this pull request Jan 23, 2013
Fix the state management of internal fscache operations and the accounting of
what operations are in what states.

This is done by:

 (1) Give struct fscache_operation a enum variable that directly represents the
     state it's currently in, rather than spreading this knowledge over a bunch
     of flags, who's processing the operation at the moment and whether it is
     queued or not.

     This makes it easier to write assertions to check the state at various
     points and to prevent invalid state transitions.

 (2) Add an 'operation complete' state and supply a function to indicate the
     completion of an operation (fscache_op_complete()) and make things call
     it.  The final call to fscache_put_operation() can then check that an op
     in the appropriate state (complete or cancelled).

 (3) Adjust the use of object->n_ops, ->n_in_progress, ->n_exclusive to better
     govern the state of an object:

	(a) The ->n_ops is now the number of extant operations on the object
	    and is now decremented by fscache_put_operation() only.

	(b) The ->n_in_progress is simply the number of objects that have been
	    taken off of the object's pending queue for the purposes of being
	    run.  This is decremented by fscache_op_complete() only.

	(c) The ->n_exclusive is the number of exclusive ops that have been
	    submitted and queued or are in progress.  It is decremented by
	    fscache_op_complete() and by fscache_cancel_op().

     fscache_put_operation() and fscache_operation_gc() now no longer try to
     clean up ->n_exclusive and ->n_in_progress.  That was leading to double
     decrements against fscache_cancel_op().

     fscache_cancel_op() now no longer decrements ->n_ops.  That was leading to
     double decrements against fscache_put_operation().

     fscache_submit_exclusive_op() now decides whether it has to queue an op
     based on ->n_in_progress being > 0 rather than ->n_ops > 0 as the latter
     will persist in being true even after all preceding operations have been
     cancelled or completed.  Furthermore, if an object is active and there are
     runnable ops against it, there must be at least one op running.

 (4) Add a remaining-pages counter (n_pages) to struct fscache_retrieval and
     provide a function to record completion of the pages as they complete.

     When n_pages reaches 0, the operation is deemed to be complete and
     fscache_op_complete() is called.

     Add calls to fscache_retrieval_complete() anywhere we've finished with a
     page we've been given to read or allocate for.  This includes places where
     we just return pages to the netfs for reading from the server and where
     accessing the cache fails and we discard the proposed netfs page.

The bugs in the unfixed state management manifest themselves as oopses like the
following where the operation completion gets out of sync with return of the
cookie by the netfs.  This is possible because the cache unlocks and returns
all the netfs pages before recording its completion - which means that there's
nothing to stop the netfs discarding them and returning the cookie.


FS-Cache: Cookie 'NFS.fh' still has outstanding reads
------------[ cut here ]------------
kernel BUG at fs/fscache/cookie.c:519!
invalid opcode: 0000 [#1] SMP
CPU 1
Modules linked in: cachefiles nfs fscache auth_rpcgss nfs_acl lockd sunrpc

Pid: 400, comm: kswapd0 Not tainted 3.1.0-rc7-fsdevel+ torvalds#1090                  /DG965RY
RIP: 0010:[<ffffffffa007050a>]  [<ffffffffa007050a>] __fscache_relinquish_cookie+0x170/0x343 [fscache]
RSP: 0018:ffff8800368cfb00  EFLAGS: 00010282
RAX: 000000000000003c RBX: ffff880023cc8790 RCX: 0000000000000000
RDX: 0000000000002f2e RSI: 0000000000000001 RDI: ffffffff813ab86c
RBP: ffff8800368cfb50 R08: 0000000000000002 R09: 0000000000000000
R10: ffff88003a1b7890 R11: ffff88001df6e488 R12: ffff880023d8ed98
R13: ffff880023cc8798 R14: 0000000000000004 R15: ffff88003b8bf370
FS:  0000000000000000(0000) GS:ffff88003bd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000008ba008 CR3: 0000000023d93000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kswapd0 (pid: 400, threadinfo ffff8800368ce000, task ffff88003b8bf040)
Stack:
 ffff88003b8bf040 ffff88001df6e528 ffff88001df6e528 ffffffffa00b46b0
 ffff88003b8bf040 ffff88001df6e488 ffff88001df6e620 ffffffffa00b46b0
 ffff88001ebd04c8 0000000000000004 ffff8800368cfb70 ffffffffa00b2c91
Call Trace:
 [<ffffffffa00b2c91>] nfs_fscache_release_inode_cookie+0x3b/0x47 [nfs]
 [<ffffffffa008f25f>] nfs_clear_inode+0x3c/0x41 [nfs]
 [<ffffffffa0090df1>] nfs4_evict_inode+0x2f/0x33 [nfs]
 [<ffffffff810d8d47>] evict+0xa1/0x15c
 [<ffffffff810d8e2e>] dispose_list+0x2c/0x38
 [<ffffffff810d9ebd>] prune_icache_sb+0x28c/0x29b
 [<ffffffff810c56b7>] prune_super+0xd5/0x140
 [<ffffffff8109b615>] shrink_slab+0x102/0x1ab
 [<ffffffff8109d690>] balance_pgdat+0x2f2/0x595
 [<ffffffff8103e009>] ? process_timeout+0xb/0xb
 [<ffffffff8109dba3>] kswapd+0x270/0x289
 [<ffffffff8104c5ea>] ? __init_waitqueue_head+0x46/0x46
 [<ffffffff8109d933>] ? balance_pgdat+0x595/0x595
 [<ffffffff8104bf7a>] kthread+0x7f/0x87
 [<ffffffff813ad6b4>] kernel_thread_helper+0x4/0x10
 [<ffffffff81026b98>] ? finish_task_switch+0x45/0xc0
 [<ffffffff813abcdd>] ? retint_restore_args+0xe/0xe
 [<ffffffff8104befb>] ? __init_kthread_worker+0x53/0x53
 [<ffffffff813ad6b0>] ? gs_change+0xb/0xb

Signed-off-by: David Howells <[email protected]>
tiwai pushed a commit that referenced this pull request Jan 23, 2013
Use the new FS-Cache invalidation facility from NFS to deal with foreign
changes being detected on the server rather than attempting to retire the old
cookie and get a new one.

The problem with the old method was that NFS did not wait for all outstanding
storage and retrieval ops on the cache to complete.  There was no automatic
wait between the calls to ->readpages() and calls to invalidate_inode_pages2()
as the latter can only wait on locked pages that have been added to the
pagecache (which they haven't yet on entry to ->readpages()).

This was leading to oopses like the one below when an outstanding read got cut
off from its cookie by a premature release.

BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
IP: [<ffffffffa0075118>] __fscache_read_or_alloc_pages+0x1dd/0x315 [fscache]
PGD 15889067 PUD 15890067 PMD 0
Oops: 0000 [#1] SMP
CPU 0
Modules linked in: cachefiles nfs fscache auth_rpcgss nfs_acl lockd sunrpc

Pid: 4544, comm: tar Not tainted 3.1.0-rc4-fsdevel+ torvalds#1064                  /DG965RY
RIP: 0010:[<ffffffffa0075118>]  [<ffffffffa0075118>] __fscache_read_or_alloc_pages+0x1dd/0x315 [fscache]
RSP: 0018:ffff8800158799e8  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8800070d41e0 RCX: ffff8800083dc1b0
RDX: 0000000000000000 RSI: ffff880015879960 RDI: ffff88003e627b90
RBP: ffff880015879a28 R08: 0000000000000002 R09: 0000000000000002
R10: 0000000000000001 R11: ffff880015879950 R12: ffff880015879aa4
R13: 0000000000000000 R14: ffff8800083dc158 R15: ffff880015879be8
FS:  00007f671e9d87c0(0000) GS:ffff88003bc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000000000a8 CR3: 000000001587f000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process tar (pid: 4544, threadinfo ffff880015878000, task ffff880015875040)
Stack:
 ffffffffa00b1759 ffff8800070dc158 ffff8800000213da ffff88002a286508
 ffff880015879aa4 ffff880015879be8 0000000000000001 ffff88002a2866e8
 ffff880015879a88 ffffffffa00b20be 00000000000200da ffff880015875040
Call Trace:
 [<ffffffffa00b1759>] ? nfs_fscache_wait_bit+0xd/0xd [nfs]
 [<ffffffffa00b20be>] __nfs_readpages_from_fscache+0x7e/0x13f [nfs]
 [<ffffffff81095fe7>] ? __alloc_pages_nodemask+0x156/0x662
 [<ffffffffa0098763>] nfs_readpages+0xee/0x187 [nfs]
 [<ffffffff81098a5e>] __do_page_cache_readahead+0x1be/0x267
 [<ffffffff81098942>] ? __do_page_cache_readahead+0xa2/0x267
 [<ffffffff81098d7b>] ra_submit+0x1c/0x20
 [<ffffffff8109900a>] ondemand_readahead+0x28b/0x29a
 [<ffffffff810990ce>] page_cache_sync_readahead+0x38/0x3a
 [<ffffffff81091d8a>] generic_file_aio_read+0x2ab/0x67e
 [<ffffffffa008cfbe>] nfs_file_read+0xa4/0xc9 [nfs]
 [<ffffffff810c22c4>] do_sync_read+0xba/0xfa
 [<ffffffff810a62c9>] ? might_fault+0x4e/0x9e
 [<ffffffff81177a47>] ? security_file_permission+0x7b/0x84
 [<ffffffff810c25dd>] ? rw_verify_area+0xab/0xc8
 [<ffffffff810c29a4>] vfs_read+0xaa/0x13a
 [<ffffffff810c2a79>] sys_read+0x45/0x6c
 [<ffffffff813ac37b>] system_call_fastpath+0x16/0x1b

Reported-by: Mark Moseley <[email protected]>
Signed-off-by: David Howells <[email protected]>
tiwai pushed a commit that referenced this pull request Jan 23, 2013
CacheFiles is missing some calls to fscache_retrieval_complete() in the error
handling/collision paths of its reader functions.

This can be seen by the following assertion tripping in fscache_put_operation()
whereby the operation being destroyed is still in the in-progress state and has
not been cancelled or completed:

FS-Cache: Assertion failed
3 == 5 is false
------------[ cut here ]------------
kernel BUG at fs/fscache/operation.c:408!
invalid opcode: 0000 [#1] SMP
CPU 2
Modules linked in: xfs ioatdma dca loop joydev evdev
psmouse dcdbas pcspkr serio_raw i5000_edac edac_core i5k_amb shpchp
pci_hotplug sg sr_mod]

Pid: 8062, comm: httpd Not tainted 3.1.0-rc8 #1 Dell Inc. PowerEdge 1950/0DT097
RIP: 0010:[<ffffffff81197b24>]  [<ffffffff81197b24>] fscache_put_operation+0x304/0x330
RSP: 0018:ffff880062f739d8  EFLAGS: 00010296
RAX: 0000000000000025 RBX: ffff8800c5122e84 RCX: ffffffff81ddf040
RDX: 00000000ffffffff RSI: 0000000000000082 RDI: ffffffff81ddef30
RBP: ffff880062f739f8 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000003 R12: ffff8800c5122e40
R13: ffff880037a2cd20 R14: ffff880087c7a058 R15: ffff880087c7a000
FS:  00007f63dcf636e0(0000) GS:ffff88022fc80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f0c0a91f000 CR3: 0000000062ec2000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process httpd (pid: 8062, threadinfo ffff880062f72000, task ffff880087e58000)
Stack:
 ffff880062f73bf8 0000000000000000 ffff880062f73bf8 ffff880037a2cd20
 ffff880062f73a68 ffffffff8119aa7e ffff88006540e000 ffff880062f73ad4
 ffff88008e9a4308 ffff880037a2cd20 ffff880062f73a48 ffff8800c5122e40
Call Trace:
 [<ffffffff8119aa7e>] __fscache_read_or_alloc_pages+0x1fe/0x530
 [<ffffffff81250780>] __nfs_readpages_from_fscache+0x70/0x1c0
 [<ffffffff8123142a>] nfs_readpages+0xca/0x1e0
 [<ffffffff815f3c06>] ? rpc_do_put_task+0x36/0x50
 [<ffffffff8122755b>] ? alloc_nfs_open_context+0x4b/0x110
 [<ffffffff815ecd1a>] ? rpc_call_sync+0x5a/0x70
 [<ffffffff810e7e9a>] __do_page_cache_readahead+0x1ca/0x270
 [<ffffffff810e7f61>] ra_submit+0x21/0x30
 [<ffffffff810e818d>] ondemand_readahead+0x11d/0x250
 [<ffffffff810e83b6>] page_cache_sync_readahead+0x36/0x60
 [<ffffffff810dffa4>] generic_file_aio_read+0x454/0x770
 [<ffffffff81224ce1>] nfs_file_read+0xe1/0x130
 [<ffffffff81121bd9>] do_sync_read+0xd9/0x120
 [<ffffffff8114088f>] ? mntput+0x1f/0x40
 [<ffffffff811238cb>] ? fput+0x1cb/0x260
 [<ffffffff81122938>] vfs_read+0xc8/0x180
 [<ffffffff81122af5>] sys_read+0x55/0x90

Reported-by: Mark Moseley <[email protected]>
Signed-off-by: David Howells <[email protected]>
tiwai pushed a commit that referenced this pull request Jan 23, 2013
The function to submit an exclusive op (fscache_submit_exclusive_op()) can BUG
if there's been an I/O error because it may see the parent cache object in an
unexpected state.  It should only BUG if there hasn't been an I/O error.

In this case the problem was produced by remounting the cache partition to be
R/O.  The EROFS state was detected and the cache was aborted, but not
everything handled the aborting correctly.

SysRq : Emergency Remount R/O
EXT4-fs (sda6): re-mounted. Opts: (null)
Emergency Remount complete
CacheFiles: I/O Error: Failed to update xattr with error -30
FS-Cache: Cache cachefiles stopped due to I/O error
------------[ cut here ]------------
kernel BUG at fs/fscache/operation.c:128!
invalid opcode: 0000 [#1] SMP 
CPU 0 
Modules linked in: cachefiles nfs fscache auth_rpcgss nfs_acl lockd sunrpc

Pid: 6612, comm: kworker/u:2 Not tainted 3.1.0-rc8-fsdevel+ torvalds#1093                  /DG965RY
RIP: 0010:[<ffffffffa00739c0>]  [<ffffffffa00739c0>] fscache_submit_exclusive_op+0x2ad/0x2c2 [fscache]
RSP: 0018:ffff880000853d40  EFLAGS: 00010206
RAX: ffff880038ac72a8 RBX: ffff8800181f2260 RCX: ffffffff81f2b2b0
RDX: 0000000000000001 RSI: ffffffff8179a478 RDI: ffff8800181f2280
RBP: ffff880000853d60 R08: 0000000000000002 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: ffff880038ac7268
R13: ffff8800181f2280 R14: ffff88003a359190 R15: 000000010122b162
FS:  0000000000000000(0000) GS:ffff88003bc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000034cc4a77f0 CR3: 0000000010e96000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/u:2 (pid: 6612, threadinfo ffff880000852000, task ffff880014c3c040)
Stack:
 ffff8800181f2260 ffff8800181f2310 ffff880038ac7268 ffff8800181f2260
 ffff880000853dc0 ffffffffa0072375 ffff880037ecfe00 ffff88003a359198
 ffff880000853dc0 0000000000000246 0000000000000000 ffff88000a91d308
Call Trace:
 [<ffffffffa0072375>] fscache_object_work_func+0x792/0xe65 [fscache]
 [<ffffffff81047e44>] process_one_work+0x1eb/0x37f
 [<ffffffff81047de6>] ? process_one_work+0x18d/0x37f
 [<ffffffffa0071be3>] ? fscache_enqueue_dependents+0xd8/0xd8 [fscache]
 [<ffffffff810482e4>] worker_thread+0x15a/0x21a
 [<ffffffff8104818a>] ? rescuer_thread+0x188/0x188
 [<ffffffff8104bf96>] kthread+0x7f/0x87
 [<ffffffff813ad6f4>] kernel_thread_helper+0x4/0x10
 [<ffffffff81026b98>] ? finish_task_switch+0x45/0xc0
 [<ffffffff813abd1d>] ? retint_restore_args+0xe/0xe
 [<ffffffff8104bf17>] ? __init_kthread_worker+0x53/0x53
 [<ffffffff813ad6f0>] ? gs_change+0xb/0xb


Signed-off-by: David Howells <[email protected]>
tiwai pushed a commit that referenced this pull request Jan 23, 2013
nfs_migrate_page() does not wait for FS-Cache to finish with a page, probably
leading to the following bad-page-state:

 BUG: Bad page state in process python-bin  pfn:17d39b
 page:ffffea00053649e8 flags:004000000000100c count:0 mapcount:0 mapping:(null)
index:38686 (Tainted: G    B      ---------------- )
 Pid: 31053, comm: python-bin Tainted: G    B      ----------------
2.6.32-71.24.1.el6.x86_64 #1
 Call Trace:
 [<ffffffff8111bfe7>] bad_page+0x107/0x160
 [<ffffffff8111ee69>] free_hot_cold_page+0x1c9/0x220
 [<ffffffff8111ef19>] __pagevec_free+0x59/0xb0
 [<ffffffff8104b988>] ? flush_tlb_others_ipi+0x128/0x130
 [<ffffffff8112230c>] release_pages+0x21c/0x250
 [<ffffffff8115b92a>] ? remove_migration_pte+0x28a/0x2b0
 [<ffffffff8115f3f8>] ? mem_cgroup_get_reclaim_stat_from_page+0x18/0x70
 [<ffffffff81122687>] ____pagevec_lru_add+0x167/0x180
 [<ffffffff811226f8>] __lru_cache_add+0x58/0x70
 [<ffffffff81122731>] lru_cache_add_lru+0x21/0x40
 [<ffffffff81123f49>] putback_lru_page+0x69/0x100
 [<ffffffff8115c0bd>] migrate_pages+0x13d/0x5d0
 [<ffffffff81122687>] ? ____pagevec_lru_add+0x167/0x180
 [<ffffffff81152ab0>] ? compaction_alloc+0x0/0x370
 [<ffffffff8115255c>] compact_zone+0x4cc/0x600
 [<ffffffff8111cfac>] ? get_page_from_freelist+0x15c/0x820
 [<ffffffff810672f4>] ? check_preempt_wakeup+0x1c4/0x3c0
 [<ffffffff8115290e>] compact_zone_order+0x7e/0xb0
 [<ffffffff81152a49>] try_to_compact_pages+0x109/0x170
 [<ffffffff8111e94d>] __alloc_pages_nodemask+0x5ed/0x850
 [<ffffffff814c9136>] ? thread_return+0x4e/0x778
 [<ffffffff81150d43>] alloc_pages_vma+0x93/0x150
 [<ffffffff81167ea5>] do_huge_pmd_anonymous_page+0x135/0x340
 [<ffffffff814cb6f6>] ? rwsem_down_read_failed+0x26/0x30
 [<ffffffff81136755>] handle_mm_fault+0x245/0x2b0
 [<ffffffff814ce383>] do_page_fault+0x123/0x3a0
 [<ffffffff814cbdf5>] page_fault+0x25/0x30

nfs_migrate_page() calls nfs_fscache_release_page() which doesn't actually wait
- even if __GFP_WAIT is set.  The reason that doesn't wait is that
fscache_maybe_release_page() might deadlock the allocator as the work threads
writing to the cache may all end up sleeping on memory allocation.

However, I wonder if that is actually a problem.  There are a number of things
I can do to deal with this:

 (1) Make nfs_migrate_page() wait.

 (2) Make fscache_maybe_release_page() honour the __GFP_WAIT flag.

 (3) Set a timeout around the wait.

 (4) Make nfs_migrate_page() return an error if the page is still busy.

For the moment, I'll select (2) and (4).

Signed-off-by: David Howells <[email protected]>
Acked-by: Jeff Layton <[email protected]>
tiwai pushed a commit that referenced this pull request Jan 23, 2013
wait_on_bit() with TASK_INTERRUPTIBLE returns 1 rather than a negative error
code, so change what we check for.  This means that the signal handling in
fscache_wait_for_retrieval_activation()  should now work properly.

Without this, the following bug can be seen if CTRL-C is pressed during
fscache read operation:

FS-Cache: Assertion failed
2 == 3 is false
------------[ cut here ]------------
kernel BUG at fs/fscache/page.c:347!
invalid opcode: 0000 [#1] SMP
Modules linked in: cachefiles(F) nfsv4(F) nfsv3(F) nfsv2(F) nfs(F) fscache(F) auth_rpcgss(F) nfs_acl(F) lockd(F) sunrpc(F)
CPU 1
Pid: 15006, comm: slurp-q Tainted: GF            3.7.0-rc8-fsdevel+ torvalds#411                  /DG965RY
RIP: 0010:[<ffffffffa007fcb4>]  [<ffffffffa007fcb4>] fscache_wait_for_retrieval_activation+0x167/0x177 [fscache]
RSP: 0018:ffff88002a4c39a8  EFLAGS: 00010292
RAX: 000000000000001a RBX: ffff88002d3dc158 RCX: 0000000000008685
RDX: ffffffff8102ccd6 RSI: 0000000000000001 RDI: ffffffff8102d1d6
RBP: ffff88002a4c39c8 R08: 0000000000000002 R09: 0000000000000000
R10: ffffffff8163afa0 R11: ffff88003bd11900 R12: ffffffffa00868c8
R13: ffff880028306458 R14: ffff88002d3dc1b0 R15: ffff88001372e538
FS:  00007f17426a0700(0000) GS:ffff88003bd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f1742494a44 CR3: 0000000031bd7000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process slurp-q (pid: 15006, threadinfo ffff88002a4c2000, task ffff880023de3040)
Stack:
 ffff88002d3dc158 ffff88001372e538 ffff88002a4c3ab4 ffff8800283064e0
 ffff88002a4c3a38 ffffffffa0080f6d 0000000000000000 ffff880023de3040
 ffff88002a4c3ac8 ffffffff810ac8ae ffff880028306458 ffff88002a4c3bc8
Call Trace:
 [<ffffffffa0080f6d>] __fscache_read_or_alloc_pages+0x24f/0x4bc [fscache]
 [<ffffffff810ac8ae>] ? __alloc_pages_nodemask+0x195/0x75c
 [<ffffffffa00aab0f>] __nfs_readpages_from_fscache+0x86/0x13d [nfs]
 [<ffffffffa00a5fe0>] nfs_readpages+0x186/0x1bd [nfs]
 [<ffffffff810d23c8>] ? alloc_pages_current+0xc7/0xe4
 [<ffffffff810a68b5>] ? __page_cache_alloc+0x84/0x91
 [<ffffffff810af912>] ? __do_page_cache_readahead+0xa6/0x2e0
 [<ffffffff810afaa3>] __do_page_cache_readahead+0x237/0x2e0
 [<ffffffff810af912>] ? __do_page_cache_readahead+0xa6/0x2e0
 [<ffffffff810afe3e>] ra_submit+0x1c/0x20
 [<ffffffff810b019b>] ondemand_readahead+0x359/0x382
 [<ffffffff810b0279>] page_cache_sync_readahead+0x38/0x3a
 [<ffffffff810a77b5>] generic_file_aio_read+0x26b/0x637
 [<ffffffffa00f1852>] ? nfs_mark_delegation_referenced+0xb/0xb [nfsv4]
 [<ffffffffa009cc85>] nfs_file_read+0xaa/0xcf [nfs]
 [<ffffffff810db5b3>] do_sync_read+0x91/0xd1
 [<ffffffff810dbb8b>] vfs_read+0x9b/0x144
 [<ffffffff810dbc78>] sys_read+0x44/0x75
 [<ffffffff81422892>] system_call_fastpath+0x16/0x1b

Signed-off-by: David Howells <[email protected]>
tiwai pushed a commit that referenced this pull request Jan 23, 2013
In fscache_write_op(), if the object is determined to have become inactive or
to have lost its cookie, we don't move the operation state from in-progress,
and so an assertion in fscache_put_operation() fails with an assertion (see
below).

Instrumenting fscache_op_work_func() indicates that it called
fscache_write_op() before calling fscache_put_operation() - where the assertion
failed.  The assertion at line 433 indicates that the operation state is
IN_PROGRESS rather than being COMPLETE or CANCELLED.

Instrumenting fscache_write_op() showed that it was being called on an object
that had had its cookie removed and that this was due to relinquishment of the
cookie by the netfs.  At this point fscache no longer has access to the pages
of netfs data that were requested to be written, and so simply cancelling the
operation is the thing to do.

FS-Cache: Assertion failed
3 == 5 is false
------------[ cut here ]------------
kernel BUG at fs/fscache/operation.c:433!
invalid opcode: 0000 [#1] SMP
Modules linked in: cachefiles(F) nfsv4(F) nfsv3(F) nfsv2(F) nfs(F) fscache(F) auth_rpcgss(F) nfs_acl(F) lockd(F) sunrpc(F)
CPU 0
Pid: 1035, comm: kworker/u:3 Tainted: GF            3.7.0-rc8-fsdevel+ torvalds#411                  /DG965RY
RIP: 0010:[<ffffffffa007db22>]  [<ffffffffa007db22>] fscache_put_operation+0x11a/0x2ed [fscache]
RSP: 0018:ffff88003e32bcf8  EFLAGS: 00010296
RAX: 000000000000000f RBX: ffff88001818eb78 RCX: ffffffff6c102000
RDX: ffffffff8102d1ad RSI: ffffffff6c102000 RDI: ffffffff8102d1d6
RBP: ffff88003e32bd18 R08: 0000000000000002 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa00811da
R13: 0000000000000001 R14: 0000000100625d26 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88003bc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007fff7dd31c68 CR3: 000000003d730000 CR4: 00000000000007f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/u:3 (pid: 1035, threadinfo ffff88003e32a000, task ffff88003bb38080)
Stack:
 ffffffff8102d1ad ffff88001818eb78 ffffffffa00811da 0000000000000001
 ffff88003e32bd48 ffffffffa007f0ad ffff88001818eb78 ffffffff819583c0
 ffff88003df24e00 ffff88003882c3e0 ffff88003e32bde8 ffffffff81042de0
Call Trace:
 [<ffffffff8102d1ad>] ? vprintk_emit+0x3c6/0x41a
 [<ffffffffa00811da>] ? __fscache_read_or_alloc_pages+0x4bc/0x4bc [fscache]
 [<ffffffffa007f0ad>] fscache_op_work_func+0xec/0x123 [fscache]
 [<ffffffff81042de0>] process_one_work+0x21c/0x3b0
 [<ffffffff81042d82>] ? process_one_work+0x1be/0x3b0
 [<ffffffffa007efc1>] ? fscache_operation_gc+0x23e/0x23e [fscache]
 [<ffffffff8104332e>] worker_thread+0x202/0x2df
 [<ffffffff8104312c>] ? rescuer_thread+0x18e/0x18e
 [<ffffffff81047c1c>] kthread+0xd0/0xd8
 [<ffffffff81421bfa>] ? _raw_spin_unlock_irq+0x29/0x3e
 [<ffffffff81047b4c>] ? __init_kthread_worker+0x55/0x55
 [<ffffffff814227ec>] ret_from_fork+0x7c/0xb0
 [<ffffffff81047b4c>] ? __init_kthread_worker+0x55/0x55

Signed-off-by: David Howells <[email protected]>
tiwai pushed a commit that referenced this pull request Jan 23, 2013
Commit 648bb56 ("cgroup: lock cgroup_mutex in cgroup_init_subsys()")
made cgroup_init_subsys() grab cgroup_mutex before invoking
->css_alloc() for the root css.  Because memcg registers hotcpu notifier
from ->css_alloc() for the root css, this introduced circular locking
dependency between cgroup_mutex and cpu hotplug.

Fix it by moving hotcpu notifier registration to a subsys initcall.

  ======================================================
  [ INFO: possible circular locking dependency detected ]
  3.7.0-rc4-work+ torvalds#42 Not tainted
  -------------------------------------------------------
  bash/645 is trying to acquire lock:
   (cgroup_mutex){+.+.+.}, at: [<ffffffff8110c5b7>] cgroup_lock+0x17/0x20

  but task is already holding lock:
   (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff8109300f>] cpu_hotplug_begin+0x2f/0x60

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

 -> #1 (cpu_hotplug.lock){+.+.+.}:
         lock_acquire+0x97/0x1e0
         mutex_lock_nested+0x61/0x3b0
         get_online_cpus+0x3c/0x60
         rebuild_sched_domains_locked+0x1b/0x70
         cpuset_write_resmask+0x298/0x2c0
         cgroup_file_write+0x1ef/0x300
         vfs_write+0xa8/0x160
         sys_write+0x52/0xa0
         system_call_fastpath+0x16/0x1b

 -> #0 (cgroup_mutex){+.+.+.}:
         __lock_acquire+0x14ce/0x1d20
         lock_acquire+0x97/0x1e0
         mutex_lock_nested+0x61/0x3b0
         cgroup_lock+0x17/0x20
         cpuset_handle_hotplug+0x1b/0x560
         cpuset_update_active_cpus+0xe/0x10
         cpuset_cpu_inactive+0x47/0x50
         notifier_call_chain+0x66/0x150
         __raw_notifier_call_chain+0xe/0x10
         __cpu_notify+0x20/0x40
         _cpu_down+0x7e/0x2f0
         cpu_down+0x36/0x50
         store_online+0x5d/0xe0
         dev_attr_store+0x18/0x30
         sysfs_write_file+0xe0/0x150
         vfs_write+0xa8/0x160
         sys_write+0x52/0xa0
         system_call_fastpath+0x16/0x1b
  other info that might help us debug this:

   Possible unsafe locking scenario:

         CPU0                    CPU1
         ----                    ----
    lock(cpu_hotplug.lock);
                                 lock(cgroup_mutex);
                                 lock(cpu_hotplug.lock);
    lock(cgroup_mutex);

   *** DEADLOCK ***

  5 locks held by bash/645:
   #0:  (&buffer->mutex){+.+.+.}, at: [<ffffffff8123bab8>] sysfs_write_file+0x48/0x150
   #1:  (s_active#42){.+.+.+}, at: [<ffffffff8123bb38>] sysfs_write_file+0xc8/0x150
   #2:  (x86_cpu_hotplug_driver_mutex){+.+...}, at: [<ffffffff81079277>] cpu_hotplug_driver_lock+0x1
+7/0x20
   #3:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff81093157>] cpu_maps_update_begin+0x17/0x20
   #4:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff8109300f>] cpu_hotplug_begin+0x2f/0x60

  stack backtrace:
  Pid: 645, comm: bash Not tainted 3.7.0-rc4-work+ torvalds#42
  Call Trace:
   print_circular_bug+0x28e/0x29f
   __lock_acquire+0x14ce/0x1d20
   lock_acquire+0x97/0x1e0
   mutex_lock_nested+0x61/0x3b0
   cgroup_lock+0x17/0x20
   cpuset_handle_hotplug+0x1b/0x560
   cpuset_update_active_cpus+0xe/0x10
   cpuset_cpu_inactive+0x47/0x50
   notifier_call_chain+0x66/0x150
   __raw_notifier_call_chain+0xe/0x10
   __cpu_notify+0x20/0x40
   _cpu_down+0x7e/0x2f0
   cpu_down+0x36/0x50
   store_online+0x5d/0xe0
   dev_attr_store+0x18/0x30
   sysfs_write_file+0xe0/0x150
   vfs_write+0xa8/0x160
   sys_write+0x52/0xa0
   system_call_fastpath+0x16/0x1b

Signed-off-by: Tejun Heo <[email protected]>
Reported-by: Fengguang Wu <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
tiwai pushed a commit that referenced this pull request Jan 23, 2013
Yan Burman reported following lockdep warning :

=============================================
[ INFO: possible recursive locking detected ]
3.7.0+ torvalds#24 Not tainted
---------------------------------------------
swapper/1/0 is trying to acquire lock:
  (&n->lock){++--..}, at: [<ffffffff8139f56e>] __neigh_event_send
+0x2e/0x2f0

but task is already holding lock:
  (&n->lock){++--..}, at: [<ffffffff813f63f4>] arp_solicit+0x1d4/0x280

other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&n->lock);
   lock(&n->lock);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

4 locks held by swapper/1/0:
  #0:  (((&n->timer))){+.-...}, at: [<ffffffff8104b350>]
call_timer_fn+0x0/0x1c0
  #1:  (&n->lock){++--..}, at: [<ffffffff813f63f4>] arp_solicit
+0x1d4/0x280
  #2:  (rcu_read_lock_bh){.+....}, at: [<ffffffff81395400>]
dev_queue_xmit+0x0/0x5d0
  #3:  (rcu_read_lock_bh){.+....}, at: [<ffffffff813cb41e>]
ip_finish_output+0x13e/0x640

stack backtrace:
Pid: 0, comm: swapper/1 Not tainted 3.7.0+ torvalds#24
Call Trace:
  <IRQ>  [<ffffffff8108c7ac>] validate_chain+0xdcc/0x11f0
  [<ffffffff8108d570>] ? __lock_acquire+0x440/0xc30
  [<ffffffff81120565>] ? kmem_cache_free+0xe5/0x1c0
  [<ffffffff8108d570>] __lock_acquire+0x440/0xc30
  [<ffffffff813c3570>] ? inet_getpeer+0x40/0x600
  [<ffffffff8108d570>] ? __lock_acquire+0x440/0xc30
  [<ffffffff8139f56e>] ? __neigh_event_send+0x2e/0x2f0
  [<ffffffff8108ddf5>] lock_acquire+0x95/0x140
  [<ffffffff8139f56e>] ? __neigh_event_send+0x2e/0x2f0
  [<ffffffff8108d570>] ? __lock_acquire+0x440/0xc30
  [<ffffffff81448d4b>] _raw_write_lock_bh+0x3b/0x50
  [<ffffffff8139f56e>] ? __neigh_event_send+0x2e/0x2f0
  [<ffffffff8139f56e>] __neigh_event_send+0x2e/0x2f0
  [<ffffffff8139f99b>] neigh_resolve_output+0x16b/0x270
  [<ffffffff813cb62d>] ip_finish_output+0x34d/0x640
  [<ffffffff813cb41e>] ? ip_finish_output+0x13e/0x640
  [<ffffffffa046f146>] ? vxlan_xmit+0x556/0xbec [vxlan]
  [<ffffffff813cb9a0>] ip_output+0x80/0xf0
  [<ffffffff813ca368>] ip_local_out+0x28/0x80
  [<ffffffffa046f25a>] vxlan_xmit+0x66a/0xbec [vxlan]
  [<ffffffffa046f146>] ? vxlan_xmit+0x556/0xbec [vxlan]
  [<ffffffff81394a50>] ? skb_gso_segment+0x2b0/0x2b0
  [<ffffffff81449355>] ? _raw_spin_unlock_irqrestore+0x65/0x80
  [<ffffffff81394c57>] ? dev_queue_xmit_nit+0x207/0x270
  [<ffffffff813950c8>] dev_hard_start_xmit+0x298/0x5d0
  [<ffffffff813956f3>] dev_queue_xmit+0x2f3/0x5d0
  [<ffffffff81395400>] ? dev_hard_start_xmit+0x5d0/0x5d0
  [<ffffffff813f5788>] arp_xmit+0x58/0x60
  [<ffffffff813f59db>] arp_send+0x3b/0x40
  [<ffffffff813f6424>] arp_solicit+0x204/0x280
  [<ffffffff813a1a70>] ? neigh_add+0x310/0x310
  [<ffffffff8139f515>] neigh_probe+0x45/0x70
  [<ffffffff813a1c10>] neigh_timer_handler+0x1a0/0x2a0
  [<ffffffff8104b3cf>] call_timer_fn+0x7f/0x1c0
  [<ffffffff8104b350>] ? detach_if_pending+0x120/0x120
  [<ffffffff8104b748>] run_timer_softirq+0x238/0x2b0
  [<ffffffff813a1a70>] ? neigh_add+0x310/0x310
  [<ffffffff81043e51>] __do_softirq+0x101/0x280
  [<ffffffff814518cc>] call_softirq+0x1c/0x30
  [<ffffffff81003b65>] do_softirq+0x85/0xc0
  [<ffffffff81043a7e>] irq_exit+0x9e/0xc0
  [<ffffffff810264f8>] smp_apic_timer_interrupt+0x68/0xa0
  [<ffffffff8145122f>] apic_timer_interrupt+0x6f/0x80
  <EOI>  [<ffffffff8100a054>] ? mwait_idle+0xa4/0x1c0
  [<ffffffff8100a04b>] ? mwait_idle+0x9b/0x1c0
  [<ffffffff8100a6a9>] cpu_idle+0x89/0xe0
  [<ffffffff81441127>] start_secondary+0x1b2/0x1b6

Bug is from arp_solicit(), releasing the neigh lock after arp_send()
In case of vxlan, we eventually need to write lock a neigh lock later.

Its a false positive, but we can get rid of it without lockdep
annotations.

We can instead use neigh_ha_snapshot() helper.

Reported-by: Yan Burman <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Acked-by: Stephen Hemminger <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
tiwai pushed a commit that referenced this pull request Jan 23, 2013
This patch adds a check for npct->prcm_base to make sure that the
address is not NULL before using it, as the driver was made capable of
loading even without a proper memory resource in:

f1671bf pinctrl/nomadik: make independent of prcmu driver

Also, refuses to probe without prcm_base on anything else than nomadik.

This solves the following crash, introduced during the merge window when
booting on U8500 with device tree:

pinctrl-nomadik pinctrl-db8500: No PRCM base, assume no ALT-Cx control is available
Unable to handle kernel NULL pointer dereference at virtual address 00000138
pgd = c0004000
[00000138] *pgd=00000000
Internal error: Oops: 5 [#1] PREEMPT SMP ARM
Modules linked in:
CPU: 0    Not tainted  (3.7.0-02892-g1ebaf4f torvalds#631)
PC is at nmk_pmx_enable+0x1bc/0x4d0
LR is at clk_disable+0x40/0x44
[snip]
[<c01d5e50>] (nmk_pmx_enable+0x1bc/0x4d0) from [<c01d3ba8>] (pinmux_enable_setting+0x12c/0x1ec)
[<c01d3ba8>] (pinmux_enable_setting+0x12c/0x1ec) from [<c01d1dc8>] (pinctrl_select_state_locked+0xfc/0x134)
[<c01d1dc8>] (pinctrl_select_state_locked+0xfc/0x134) from [<c01d2814>] (pinctrl_register+0x26c/0x43c)
[<c01d2814>] (pinctrl_register+0x26c/0x43c) from [<c01d668c>] (nmk_pinctrl_probe+0x114/0x238)
[<c01d668c>] (nmk_pinctrl_probe+0x114/0x238) from [<c0211cc4>] (platform_drv_probe+0x28/0x2c)
[<c0211cc4>] (platform_drv_probe+0x28/0x2c) from [<c0210738>] (driver_probe_device+0x84/0x21c)
[<c0210738>] (driver_probe_device+0x84/0x21c) from [<c02109c0>] (__device_attach+0x50/0x54)
[<c02109c0>] (__device_attach+0x50/0x54) from [<c020eb1c>] (bus_for_each_drv+0x54/0x9c)
[<c020eb1c>] (bus_for_each_drv+0x54/0x9c) from [<c0210668>] (device_attach+0x84/0x9c)
[<c0210668>] (device_attach+0x84/0x9c) from [<c020fbac>] (bus_probe_device+0x94/0xb8)
[<c020fbac>] (bus_probe_device+0x94/0xb8) from [<c020e084>] (device_add+0x4f0/0x5bc)
[<c020e084>] (device_add+0x4f0/0x5bc) from [<c0276400>] (of_device_add+0x40/0x48)
[<c0276400>] (of_device_add+0x40/0x48) from [<c0276a98>] (of_platform_device_create_pdata+0x68/0x98)
[<c0276a98>] (of_platform_device_create_pdata+0x68/0x98) from [<c0276bac>] (of_platform_bus_create+0xe4/0x260)
[<c0276bac>] (of_platform_bus_create+0xe4/0x260) from [<c0276bf8>] (of_platform_bus_create+0x130/0x260)
[<c0276bf8>] (of_platform_bus_create+0x130/0x260) from [<c0276d94>] (of_platform_populate+0x6c/0xac)
[<c0276d94>] (of_platform_populate+0x6c/0xac) from [<c04a8224>] (u8500_init_machine+0x78/0x140)
[<c04a8224>] (u8500_init_machine+0x78/0x140) from [<c04a3560>] (customize_machine+0x24/0x30)
[<c04a3560>] (customize_machine+0x24/0x30) from [<c00087b0>] (do_one_initcall+0x130/0x1b0)
[<c00087b0>] (do_one_initcall+0x130/0x1b0) from [<c033ff9c>] (kernel_init+0x138/0x2e8)
[<c033ff9c>] (kernel_init+0x138/0x2e8) from [<c000eb18>] (ret_from_fork+0x14/0x20)
Code: 0a00001b e19400b2 e59a200c e0822000 (e592c000)
---[ end trace 1b75b31a2719ed1c ]---
note: swapper/0[1] exited with preempt_count 1
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

Signed-off-by: Fabio Baltieri <[email protected]>
Signed-off-by: Linus Walleij <[email protected]>
tiwai pushed a commit that referenced this pull request Jan 23, 2013
Cai reported this oops:

[90701.616664] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
[90701.625438] IP: [<ffffffff814a343e>] kernel_setsockopt+0x2e/0x60
[90701.632167] PGD fea319067 PUD 103fda4067 PMD 0
[90701.637255] Oops: 0000 [#1] SMP
[90701.640878] Modules linked in: des_generic md4 nls_utf8 cifs dns_resolver binfmt_misc tun sg igb iTCO_wdt iTCO_vendor_support lpc_ich pcspkr i2c_i801 i2c_core i7core_edac edac_core ioatdma dca mfd_core coretemp kvm_intel kvm crc32c_intel microcode sr_mod cdrom ata_generic sd_mod pata_acpi crc_t10dif ata_piix libata megaraid_sas dm_mirror dm_region_hash dm_log dm_mod
[90701.677655] CPU 10
[90701.679808] Pid: 9627, comm: ls Tainted: G        W    3.7.1+ torvalds#10 QCI QSSC-S4R/QSSC-S4R
[90701.688950] RIP: 0010:[<ffffffff814a343e>]  [<ffffffff814a343e>] kernel_setsockopt+0x2e/0x60
[90701.698383] RSP: 0018:ffff88177b431bb8  EFLAGS: 00010206
[90701.704309] RAX: ffff88177b431fd8 RBX: 00007ffffffff000 RCX: ffff88177b431bec
[90701.712271] RDX: 0000000000000003 RSI: 0000000000000006 RDI: 0000000000000000
[90701.720223] RBP: ffff88177b431bc8 R08: 0000000000000004 R09: 0000000000000000
[90701.728185] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000001
[90701.736147] R13: ffff88184ef92000 R14: 0000000000000023 R15: ffff88177b431c88
[90701.744109] FS:  00007fd56a1a47c0(0000) GS:ffff88105fc40000(0000) knlGS:0000000000000000
[90701.753137] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[90701.759550] CR2: 0000000000000028 CR3: 000000104f15f000 CR4: 00000000000007e0
[90701.767512] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[90701.775465] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[90701.783428] Process ls (pid: 9627, threadinfo ffff88177b430000, task ffff88185ca4cb60)
[90701.792261] Stack:
[90701.794505]  0000000000000023 ffff88177b431c50 ffff88177b431c38 ffffffffa014fcb1
[90701.802809]  ffff88184ef921bc 0000000000000000 00000001ffffffff ffff88184ef921c0
[90701.811123]  ffff88177b431c08 ffffffff815ca3d9 ffff88177b431c18 ffff880857758000
[90701.819433] Call Trace:
[90701.822183]  [<ffffffffa014fcb1>] smb_send_rqst+0x71/0x1f0 [cifs]
[90701.828991]  [<ffffffff815ca3d9>] ? schedule+0x29/0x70
[90701.834736]  [<ffffffffa014fe6d>] smb_sendv+0x3d/0x40 [cifs]
[90701.841062]  [<ffffffffa014fe96>] smb_send+0x26/0x30 [cifs]
[90701.847291]  [<ffffffffa015801f>] send_nt_cancel+0x6f/0xd0 [cifs]
[90701.854102]  [<ffffffffa015075e>] SendReceive+0x18e/0x360 [cifs]
[90701.860814]  [<ffffffffa0134a78>] CIFSFindFirst+0x1a8/0x3f0 [cifs]
[90701.867724]  [<ffffffffa013f731>] ? build_path_from_dentry+0xf1/0x260 [cifs]
[90701.875601]  [<ffffffffa013f731>] ? build_path_from_dentry+0xf1/0x260 [cifs]
[90701.883477]  [<ffffffffa01578e6>] cifs_query_dir_first+0x26/0x30 [cifs]
[90701.890869]  [<ffffffffa015480d>] initiate_cifs_search+0xed/0x250 [cifs]
[90701.898354]  [<ffffffff81195970>] ? fillonedir+0x100/0x100
[90701.904486]  [<ffffffffa01554cb>] cifs_readdir+0x45b/0x8f0 [cifs]
[90701.911288]  [<ffffffff81195970>] ? fillonedir+0x100/0x100
[90701.917410]  [<ffffffff81195970>] ? fillonedir+0x100/0x100
[90701.923533]  [<ffffffff81195970>] ? fillonedir+0x100/0x100
[90701.929657]  [<ffffffff81195848>] vfs_readdir+0xb8/0xe0
[90701.935490]  [<ffffffff81195b9f>] sys_getdents+0x8f/0x110
[90701.941521]  [<ffffffff815d3b99>] system_call_fastpath+0x16/0x1b
[90701.948222] Code: 66 90 55 65 48 8b 04 25 f0 c6 00 00 48 89 e5 53 48 83 ec 08 83 fe 01 48 8b 98 48 e0 ff ff 48 c7 80 48 e0 ff ff ff ff ff ff 74 22 <48> 8b 47 28 ff 50 68 65 48 8b 14 25 f0 c6 00 00 48 89 9a 48 e0
[90701.970313] RIP  [<ffffffff814a343e>] kernel_setsockopt+0x2e/0x60
[90701.977125]  RSP <ffff88177b431bb8>
[90701.981018] CR2: 0000000000000028
[90701.984809] ---[ end trace 24bd602971110a43 ]---

This is likely due to a race vs. a reconnection event.

The current code checks for a NULL socket in smb_send_kvec, but that's
too late. By the time that check is done, the socket will already have
been passed to kernel_setsockopt. Move the check into smb_send_rqst, so
that it's checked earlier.

In truth, this is a bit of a half-assed fix. The -ENOTSOCK error
return here looks like it could bubble back up to userspace. The locking
rules around the ssocket pointer are really unclear as well. There are
cases where the ssocket pointer is changed without holding the srv_mutex,
but I'm not clear whether there's a potential race here yet or not.

This code seems like it could benefit from some fundamental re-think of
how the socket handling should behave. Until then though, this patch
should at least fix the above oops in most cases.

Cc: <[email protected]> # 3.7+
Reported-and-Tested-by: CAI Qian <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Steve French <[email protected]>
tiwai pushed a commit that referenced this pull request Jan 23, 2013
Sasha was fuzzing with trinity and reported the following problem:

  BUG: sleeping function called from invalid context at kernel/mutex.c:269
  in_atomic(): 1, irqs_disabled(): 0, pid: 6361, name: trinity-main
  2 locks held by trinity-main/6361:
   #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff810aa314>] __do_page_fault+0x1e4/0x4f0
   #1:  (&(&mm->page_table_lock)->rlock){+.+...}, at: [<ffffffff8122f017>] handle_pte_fault+0x3f7/0x6a0
  Pid: 6361, comm: trinity-main Tainted: G        W
  3.7.0-rc2-next-20121024-sasha-00001-gd95ef01-dirty torvalds#74
  Call Trace:
    __might_sleep+0x1c3/0x1e0
    mutex_lock_nested+0x29/0x50
    mpol_shared_policy_lookup+0x2e/0x90
    shmem_get_policy+0x2e/0x30
    get_vma_policy+0x5a/0xa0
    mpol_misplaced+0x41/0x1d0
    handle_pte_fault+0x465/0x6a0

This was triggered by a different version of automatic NUMA balancing
but in theory the current version is vunerable to the same problem.

do_numa_page
  -> numa_migrate_prep
    -> mpol_misplaced
      -> get_vma_policy
        -> shmem_get_policy

It's very unlikely this will happen as shared pages are not marked
pte_numa -- see the page_mapcount() check in change_pte_range() -- but
it is possible.

To address this, this patch restores sp->lock as originally implemented
by Kosaki Motohiro.  In the path where get_vma_policy() is called, it
should not be calling sp_alloc() so it is not necessary to treat the PTL
specially.

Signed-off-by: KOSAKI Motohiro <[email protected]>
Tested-by: KOSAKI Motohiro <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
tiwai pushed a commit that referenced this pull request Jan 23, 2013
The missing NULL terminator can cause a panic on
PPC405 boards during boot:

  Linux/PowerPC load: console=ttyS0,115200 root=/dev/mtdblock1 rootfstype=squashfs,jffs2 noinitrd init=/etc/preinit
  Finalizing device tree... flat tree at 0x6a5160
  bootconsole [udbg0] enabled
  Page fault in user mode with in_atomic() = 1 mm = (null)
  NIP = c0275f50  MSR = fffffffe
  Oops: Weird page fault, sig: 11 [#1]
  PowerPC 40x Platform
  Modules linked in:
  NIP: c0275f50 LR: c0275f60 CTR: c0280000
  REGS: c0275eb0 TRAP: 636f7265   Not tainted  (3.7.1)
  MSR: fffffffe <VEC,VSX,EE,PR,FP,ME,SE,BE,IR,DR,PMM,RI> CR: c06a6190  XER: 00000001
  TASK = c02662a8[0] 'swapper' THREAD: c0274000
  GPR00: c0275ec0 c000c658 c027c4bf 00000000 c0275ee0 c000a0ec c020a1a8 c020a1f0
  GPR08: c020f631 c020f404 c025f078 c025f080 c0275f10
   Call Trace:
   ---[ end trace 31fd0ba7d8756001 ]---

  Kernel panic - not syncing: Attempted to kill the idle task!

The panic happens since commit 9597abe
(sections: fix section conflicts in arch/powerpc), however the root
cause of this is that the NULL terminator were not added in commit
a4f740c (of/flattree: Add of_flat_dt_match()
helper function).

Cc: Grant Likely <[email protected]>
Cc: <[email protected]>
Signed-off-by: Gabor Juhos <[email protected]>
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
tiwai pushed a commit that referenced this pull request Jan 23, 2013
All boards, except Amstrad E3, mark USB config with __initdata.

As a result, when you compile USB into modules, they will try to refer
already released platform data and the behaviour is undefined. For example
on Nokia 770, I get the following kernel panic when modprobing ohci-hcd:

[    3.462158] Unable to handle kernel paging request at virtual address e7fddef0
[    3.477050] pgd = c3434000
[    3.487365] [e7fddef0] *pgd=00000000
[    3.498535] Internal error: Oops: 80000005 [#1] ARM
[    3.510955] Modules linked in: ohci_hcd(+)
[    3.522705] CPU: 0    Not tainted  (3.7.0-770_tiny+ #5)
[    3.535552] PC is at 0xe7fddef0
[    3.546508] LR is at ohci_omap_init+0x5c/0x144 [ohci_hcd]
[    3.560272] pc : [<e7fddef0>]    lr : [<bf003140>]    psr: a0000013
[    3.560272] sp : c344bdb0  ip : c344bce0  fp : c344bdcc
[    3.589782] r10: 00000001  r9 : 00000000  r8 : 00000000
[    3.604553] r7 : 00000026  r6 : 000000de  r5 : c0227300  r4 : c342d620
[    3.621032] r3 : e7fddef0  r2 : c048b880  r1 : 00000000  r0 : 0000000a
[    3.637786] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[    3.655822] Control: 0005317f  Table: 13434000  DAC: 00000015
[    3.672790] Process modprobe (pid: 425, stack limit = 0xc344a1b8)
[    3.690643] Stack: (0xc344bdb0 to 0xc344c000)
[    3.707031] bda0:                                     bf0030e4 c342d620 00000000 c049e62c
[    3.727905] bdc0: c344be04 c344bdd0 c0150ff0 bf0030f4 bf001b88 00000000 c048a4ac c345b020
[    3.748870] bde0: c342d620 00000000 c048a468 bf003968 00000001 bf006000 c344be34 c344be08
[    3.769836] be00: bf001bf0 c0150e48 00000000 c344be18 c00b9bfc c048a478 c048a4ac bf0037f8
[    3.790985] be20: c012ca04 c000e024 c344be44 c344be38 c012d968 bf001a84 c344be64 c344be48
[    3.812164] be40: c012c8ac c012d95c 00000000 c048a478 c048a4ac bf0037f8 c344be84 c344be68
[    3.833740] be60: c012ca74 c012c80c 20000013 00000000 c344be88 bf0037f8 c344beac c344be88
[    3.855468] be80: c012b038 c012ca14 c38093cc c383ee10 bf0037f8 c35be5a0 c049d5e8 00000000
[    3.877166] bea0: c344bebc c344beb0 c012c40c c012aff4 c344beec c344bec0 c012bfc0 c012c3fc
[    3.898834] bec0: bf00378c 00000000 c344beec bf0037f8 00067f39 00000000 00005c44 c000e024
[    3.920837] bee0: c344bf14 c344bef0 c012cd54 c012befc c04ce080 00067f39 00000000 00005c44
[    3.943023] bf00: c000e024 bf006000 c344bf24 c344bf18 c012db14 c012ccc0 c344bf3c c344bf28
[    3.965423] bf20: bf00604c c012dad8 c344a000 bf003834 c344bf7c c344bf40 c00087ac bf006010
[    3.987976] bf40: 0000000f bf003834 00067f39 00000000 00005c44 bf003834 00067f39 00000000
[    4.010711] bf60: 00005c44 c000e024 c344a000 00000000 c344bfa4 c344bf80 c004c35c c0008720
[    4.033569] bf80: c344bfac c344bf90 01422192 01427ea0 00000000 00000080 00000000 c344bfa8
[    4.056518] bfa0: c000dec0 c004c2f0 01422192 01427ea0 01427ea0 00005c44 00067f39 00000000
[    4.079406] bfc0: 01422192 01427ea0 00000000 00000080 b6e11008 014221aa be941fcc b6e1e008
[    4.102569] bfe0: b6ef6300 be941758 0000e93c b6ef6310 60000010 01427ea0 00000000 00000000
[    4.125946] Backtrace:
[    4.143463] [<bf0030e4>] (ohci_omap_init+0x0/0x144 [ohci_hcd]) from [<c0150ff0>] (usb_add_hcd+0x1b8/0x61c)
[    4.183898]  r6:c049e62c r5:00000000 r4:c342d620 r3:bf0030e4
[    4.205596] [<c0150e38>] (usb_add_hcd+0x0/0x61c) from [<bf001bf0>] (ohci_hcd_omap_drv_probe+0x17c/0x224 [ohci_hcd])
[    4.248138] [<bf001a74>] (ohci_hcd_omap_drv_probe+0x0/0x224 [ohci_hcd]) from [<c012d968>] (platform_drv_probe+0x1c/0x20)
[    4.292144]  r8:c000e024 r7:c012ca04 r6:bf0037f8 r5:c048a4ac r4:c048a478
[    4.316192] [<c012d94c>] (platform_drv_probe+0x0/0x20) from [<c012c8ac>] (driver_probe_device+0xb0/0x208)
[    4.360168] [<c012c7fc>] (driver_probe_device+0x0/0x208) from [<c012ca74>] (__driver_attach+0x70/0x94)
[    4.405548]  r6:bf0037f8 r5:c048a4ac r4:c048a478 r3:00000000
[    4.429809] [<c012ca04>] (__driver_attach+0x0/0x94) from [<c012b038>] (bus_for_each_dev+0x54/0x90)
[    4.475708]  r6:bf0037f8 r5:c344be88 r4:00000000 r3:20000013
[    4.500366] [<c012afe4>] (bus_for_each_dev+0x0/0x90) from [<c012c40c>] (driver_attach+0x20/0x28)
[    4.528442]  r7:00000000 r6:c049d5e8 r5:c35be5a0 r4:bf0037f8
[    4.553466] [<c012c3ec>] (driver_attach+0x0/0x28) from [<c012bfc0>] (bus_add_driver+0xd4/0x228)
[    4.581878] [<c012beec>] (bus_add_driver+0x0/0x228) from [<c012cd54>] (driver_register+0xa4/0x134)
[    4.629730]  r8:c000e024 r7:00005c44 r6:00000000 r5:00067f39 r4:bf0037f8
[    4.656738] [<c012ccb0>] (driver_register+0x0/0x134) from [<c012db14>] (platform_driver_register+0x4c/0x60)
[    4.706542] [<c012dac8>] (platform_driver_register+0x0/0x60) from [<bf00604c>] (ohci_hcd_mod_init+0x4c/0x8c [ohci_hcd])
[    4.757843] [<bf006000>] (ohci_hcd_mod_init+0x0/0x8c [ohci_hcd]) from [<c00087ac>] (do_one_initcall+0x9c/0x174)
[    4.808990]  r4:bf003834 r3:c344a000
[    4.832641] [<c0008710>] (do_one_initcall+0x0/0x174) from [<c004c35c>] (sys_init_module+0x7c/0x194)
[    4.881530] [<c004c2e0>] (sys_init_module+0x0/0x194) from [<c000dec0>] (ret_fast_syscall+0x0/0x2c)
[    4.930664]  r7:00000080 r6:00000000 r5:01427ea0 r4:01422192
[    4.956481] Code: bad PC value
[    4.978729] ---[ end trace 58280240f08342c4 ]---
[    5.002258] Kernel panic - not syncing: Fatal exception

Fix this by taking a copy of the data. Also mark Amstrad E3's data with
__initdata to save some memory with multi-board kernels.

Signed-off-by: Aaro Koskinen <[email protected]>
Signed-off-by: Tony Lindgren <[email protected]>
tiwai pushed a commit that referenced this pull request Jan 23, 2013
Since commit e303297 ("mm: extended batches for generic
mmu_gather") we are batching pages to be freed until either
tlb_next_batch cannot allocate a new batch or we are done.

This works just fine most of the time but we can get in troubles with
non-preemptible kernel (CONFIG_PREEMPT_NONE or CONFIG_PREEMPT_VOLUNTARY)
on large machines where too aggressive batching might lead to soft
lockups during process exit path (exit_mmap) because there are no
scheduling points down the free_pages_and_swap_cache path and so the
freeing can take long enough to trigger the soft lockup.

The lockup is harmless except when the system is setup to panic on
softlockup which is not that unusual.

The simplest way to work around this issue is to limit the maximum
number of batches in a single mmu_gather.  10k of collected pages should
be safe to prevent from soft lockups (we would have 2ms for one) even if
they are all freed without an explicit scheduling point.

This patch doesn't add any new explicit scheduling points because it
relies on zap_pmd_range during page tables zapping which calls
cond_resched per PMD.

The following lockup has been reported for 3.0 kernel with a huge
process (in order of hundreds gigs but I do know any more details).

  BUG: soft lockup - CPU#56 stuck for 22s! [kernel:31053]
  Modules linked in: af_packet nfs lockd fscache auth_rpcgss nfs_acl sunrpc mptctl mptbase autofs4 binfmt_misc dm_round_robin dm_multipath bonding cpufreq_conservative cpufreq_userspace cpufreq_powersave pcc_cpufreq mperf microcode fuse loop osst sg sd_mod crc_t10dif st qla2xxx scsi_transport_fc scsi_tgt netxen_nic i7core_edac iTCO_wdt joydev e1000e serio_raw pcspkr edac_core iTCO_vendor_support acpi_power_meter rtc_cmos hpwdt hpilo button container usbhid hid dm_mirror dm_region_hash dm_log linear uhci_hcd ehci_hcd usbcore usb_common scsi_dh_emc scsi_dh_alua scsi_dh_hp_sw scsi_dh_rdac scsi_dh dm_snapshot pcnet32 mii edd dm_mod raid1 ext3 mbcache jbd fan thermal processor thermal_sys hwmon cciss scsi_mod
  Supported: Yes
  CPU 56
  Pid: 31053, comm: kernel Not tainted 3.0.31-0.9-default #1 HP ProLiant DL580 G7
  RIP: 0010:  _raw_spin_unlock_irqrestore+0x8/0x10
  RSP: 0018:ffff883ec1037af0  EFLAGS: 00000206
  RAX: 0000000000000e00 RBX: ffffea01a0817e28 RCX: ffff88803ffd9e80
  RDX: 0000000000000200 RSI: 0000000000000206 RDI: 0000000000000206
  RBP: 0000000000000002 R08: 0000000000000001 R09: ffff887ec724a400
  R10: 0000000000000000 R11: dead000000200200 R12: ffffffff8144c26e
  R13: 0000000000000030 R14: 0000000000000297 R15: 000000000000000e
  FS:  00007ed834282700(0000) GS:ffff88c03f200000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 000000000068b240 CR3: 0000003ec13c5000 CR4: 00000000000006e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Process kernel (pid: 31053, threadinfo ffff883ec1036000, task ffff883ebd5d4100)
  Call Trace:
    release_pages+0xc5/0x260
    free_pages_and_swap_cache+0x9d/0xc0
    tlb_flush_mmu+0x5c/0x80
    tlb_finish_mmu+0xe/0x50
    exit_mmap+0xbd/0x120
    mmput+0x49/0x120
    exit_mm+0x122/0x160
    do_exit+0x17a/0x430
    do_group_exit+0x3d/0xb0
    get_signal_to_deliver+0x247/0x480
    do_signal+0x71/0x1b0
    do_notify_resume+0x98/0xb0
    int_signal+0x12/0x17
  DWARF2 unwinder stuck at int_signal+0x12/0x17

Signed-off-by: Michal Hocko <[email protected]>
Cc: <[email protected]>	[3.0+]
Cc: Mel Gorman <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
tiwai pushed a commit that referenced this pull request Jan 23, 2013
…eset

The problem occurs when iptables constructs the tcp reset packet.
It doesn't initialize the pointer to the tcp header within the skb.
When the skb is passed to the ixgbe driver for transmit, the ixgbe
driver attempts to access the tcp header and crashes.
Currently, other drivers (such as our 1G e1000e or igb drivers) don't
access the tcp header on transmit unless the TSO option is turned on.

<1>BUG: unable to handle kernel NULL pointer dereference at 0000000d
<1>IP: [<d081621c>] ixgbe_xmit_frame_ring+0x8cc/0x2260 [ixgbe]
<4>*pdpt = 0000000085e5d001 *pde = 0000000000000000
<0>Oops: 0000 [#1] SMP
[...]
<4>Pid: 0, comm: swapper Tainted: P            2.6.35.12 #1 Greencity/Thurley
<4>EIP: 0060:[<d081621c>] EFLAGS: 00010246 CPU: 16
<4>EIP is at ixgbe_xmit_frame_ring+0x8cc/0x2260 [ixgbe]
<4>EAX: c7628820 EBX: 00000007 ECX: 00000000 EDX: 00000000
<4>ESI: 00000008 EDI: c6882180 EBP: dfc6b000 ESP: ced95c48
<4> DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
<0>Process swapper (pid: 0, ti=ced94000 task=ced73bd0 task.ti=ced94000)
<0>Stack:
<4> cbec7418 c779e0d8 c77cc888 c77cc8a8 0903010a 00000000 c77c0008 00000002
<4><0> cd4997c0 00000010 dfc6b000 00000000 d0d176c9 c77cc8d8 c6882180 cbec7318
<4><0> 00000004 00000004 cbec7230 cbec7110 00000000 cbec70c0 c779e000 00000002
<0>Call Trace:
<4> [<d0d176c9>] ? 0xd0d176c9
<4> [<d0d18a4d>] ? 0xd0d18a4d
<4> [<411e243e>] ? dev_hard_start_xmit+0x218/0x2d7
<4> [<411f03d7>] ? sch_direct_xmit+0x4b/0x114
<4> [<411f056a>] ? __qdisc_run+0xca/0xe0
<4> [<411e28b0>] ? dev_queue_xmit+0x2d1/0x3d0
<4> [<411e8120>] ? neigh_resolve_output+0x1c5/0x20f
<4> [<411e94a1>] ? neigh_update+0x29c/0x330
<4> [<4121cf29>] ? arp_process+0x49c/0x4cd
<4> [<411f80c9>] ? nf_hook_slow+0x3f/0xac
<4> [<4121ca8d>] ? arp_process+0x0/0x4cd
<4> [<4121ca8d>] ? arp_process+0x0/0x4cd
<4> [<4121c6d5>] ? T.901+0x38/0x3b
<4> [<4121c918>] ? arp_rcv+0xa3/0xb4
<4> [<4121ca8d>] ? arp_process+0x0/0x4cd
<4> [<411e1173>] ? __netif_receive_skb+0x32b/0x346
<4> [<411e19e1>] ? netif_receive_skb+0x5a/0x5f
<4> [<411e1ea9>] ? napi_skb_finish+0x1b/0x30
<4> [<d0816eb4>] ? ixgbe_xmit_frame_ring+0x1564/0x2260 [ixgbe]
<4> [<41013468>] ? lapic_next_event+0x13/0x16
<4> [<410429b2>] ? clockevents_program_event+0xd2/0xe4
<4> [<411e1b03>] ? net_rx_action+0x55/0x127
<4> [<4102da1a>] ? __do_softirq+0x77/0xeb
<4> [<4102dab1>] ? do_softirq+0x23/0x27
<4> [<41003a67>] ? do_IRQ+0x7d/0x8e
<4> [<41002a69>] ? common_interrupt+0x29/0x30
<4> [<41007bcf>] ? mwait_idle+0x48/0x4d
<4> [<4100193b>] ? cpu_idle+0x37/0x4c
<0>Code: df 09 d7 0f 94 c2 0f b6 d2 e9 e7 fb ff ff 31 db 31 c0 e9 38
ff ff ff 80 78 06 06 0f 85 3e fb ff ff 8b 7c 24 38 8b 8f b8 00 00 00
<0f> b6 51 0d f6 c2 01 0f 85 27 fb ff ff 80 e2 02 75 0d 8b 6c 24
<0>EIP: [<d081621c>] ixgbe_xmit_frame_ring+0x8cc/0x2260 [ixgbe] SS:ESP

Signed-off-by: Mukund Jampala <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
tiwai pushed a commit that referenced this pull request Jan 27, 2013
The function sec_reg_read invokes regmap_read which expects unsigned int *
as the destination address. The existing driver is passing address of local
variable "val" which is u8. This causes the stack corruption and following
dump is observed during probe.

Hence change "val" from u8 to unsigned int.

Unable to handle kernel paging request at virtual address 02410020
pgd = c0004000
[02410020] *pgd=00000000
Internal error: Oops: 80000005 [#1] PREEMPT SMP ARM
Modules linked in:
CPU: 0    Not tainted  (3.6.0-00696-g98a28b18-dirty torvalds#27)
PC is at 0x2410020
LR is at _regulator_get_voltage+0x3c/0x70
pc : [<02410020>]    lr : [<c02395d4>]    psr: 20000013
sp : cf839b68  ip : 00000000  fp : cf92d410
r10: 0000cfd0  r9 : c06d9878  r8 : 0000f0a0
r7 : cf839b70  r6 : cf92d400  r5 : 00000011  r4 : cf000000
r3 : 02410020  r2 : 00000000  r1 : 00000048  r0 : cf000000
Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
...........................
.................................

[<c02395d4>] (_regulator_get_voltage+0x3c/0x70) from [<c023ad80>] (print_constraints+0x50/0x36c)
[<c023ad80>] (print_constraints+0x50/0x36c) from [<c023e504>] (set_machine_constraints+0xe8/0x2b0)
[<c023e504>] (set_machine_constraints+0xe8/0x2b0) from [<c023e9c8>] (regulator_register+0x2fc/0x604)
[<c023e9c8>] (regulator_register+0x2fc/0x604) from [<c049d628>] (s5m8767_pmic_probe+0x688/0x718)
[<c049d628>] (s5m8767_pmic_probe+0x688/0x718) from [<c029915c>] (platform_drv_probe+0x18/0x1c)
[<c029915c>] (platform_drv_probe+0x18/0x1c) from [<c0297dd0>] (really_probe+0x68/0x1f4)
[<c0297dd0>] (really_probe+0x68/0x1f4) from [<c0298070>] (driver_probe_device+0x30/0x48)

Signed-off-by: Inderpal Singh <[email protected]>
Signed-off-by: Mark Brown <[email protected]>
tiwai pushed a commit that referenced this pull request Jan 27, 2013
Instantiating the driver with no available regulator results in:

[39711.686393] i2c i2c-7: new_device: Instantiated device max1139 at 0x35
[39711.688687] BUG: unable to handle kernel paging request at fffffffffffffe13
[39711.688734] IP: [<ffffffff813e835b>] regulator_disable+0x1b/0x80
[39711.688788] PGD 1c0e067 PUD 1c0f067 PMD 0
[39711.688835] Oops: 0000 [#1] SMP

Caused by bad probe error path. Fix it.

Driver should also not attempt to free the interrupt in its error path if
none was allocated. Fix that problem as well.

Finally, testing if the regulator was allocated is not necessary in the
remove function, since the probe function bails out if this is the case.
Remove that check.

Signed-off-by: Guenter Roeck <[email protected]>
Signed-off-by: Jonathan Cameron <[email protected]>
tiwai pushed a commit that referenced this pull request Jan 27, 2013
Pursuant to this review https://lkml.org/lkml/2012/11/12/500
by Stefan Richter, update the TODO file.
- Clarify purpose of TODO file
- Remove firewire item #4. As discussed in this conversation
  https://lkml.org/lkml/2012/11/13/564 knowing the AR buffer size
  is not a hard requirement. The required rx buffer size can be
  determined experimentally.
- Remove firewire item #5. This was a private note for further
  experimentation.
- Change firewire item #1. Change suggested header from uapi header
  to kernel-only header.

Signed-off-by: Peter Hurley <[email protected]>
Acked-by: Stefan Richter <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
tiwai pushed a commit that referenced this pull request Jan 27, 2013
Commit 5a50508 ("mm/rmap: Convert the struct anon_vma::mutex to an
rwsem") turned anon_vma mutex to rwsem.

However, the properly annotated nested locking in mm_take_all_locks()
has been converted from

	mutex_lock_nest_lock(&anon_vma->root->mutex, &mm->mmap_sem);

to

	down_write(&anon_vma->root->rwsem);

which is incomplete, and causes the false positive report from lockdep
below.

Annotate the fact that mmap_sem is used as an outter lock to serialize
taking of all the anon_vma rwsems at once no matter the order, using the
down_write_nest_lock() primitive.

This patch fixes this lockdep report:

 =============================================
 [ INFO: possible recursive locking detected ]
 3.8.0-rc2-00036-g5f73896 torvalds#171 Not tainted
 ---------------------------------------------
 qemu-kvm/2315 is trying to acquire lock:
  (&anon_vma->rwsem){+.+...}, at: mm_take_all_locks+0x149/0x1b0

 but task is already holding lock:
  (&anon_vma->rwsem){+.+...}, at: mm_take_all_locks+0x149/0x1b0

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&anon_vma->rwsem);
   lock(&anon_vma->rwsem);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

 4 locks held by qemu-kvm/2315:
  #0:  (&mm->mmap_sem){++++++}, at: do_mmu_notifier_register+0xfc/0x170
  #1:  (mm_all_locks_mutex){+.+...}, at: mm_take_all_locks+0x36/0x1b0
  #2:  (&mapping->i_mmap_mutex){+.+...}, at: mm_take_all_locks+0xc9/0x1b0
  #3:  (&anon_vma->rwsem){+.+...}, at: mm_take_all_locks+0x149/0x1b0

 stack backtrace:
 Pid: 2315, comm: qemu-kvm Not tainted 3.8.0-rc2-00036-g5f73896 torvalds#171
 Call Trace:
   print_deadlock_bug+0xf2/0x100
   validate_chain+0x4f6/0x720
   __lock_acquire+0x359/0x580
   lock_acquire+0x121/0x190
   down_write+0x3f/0x70
   mm_take_all_locks+0x149/0x1b0
   do_mmu_notifier_register+0x68/0x170
   mmu_notifier_register+0xe/0x10
   kvm_create_vm+0x22b/0x330 [kvm]
   kvm_dev_ioctl+0xf8/0x1a0 [kvm]
   do_vfs_ioctl+0x9d/0x350
   sys_ioctl+0x91/0xb0
   system_call_fastpath+0x16/0x1b

Signed-off-by: Jiri Kosina <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Mel Gorman <[email protected]>
Tested-by: Sedat Dilek <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
tiwai pushed a commit that referenced this pull request Jan 27, 2013
Zhouping Liu reported the following against 3.8-rc1 when running a mmap
testcase from LTP.

  mapcount 0 page_mapcount 3
  ------------[ cut here ]------------
  kernel BUG at mm/huge_memory.c:1798!
  invalid opcode: 0000 [#1] SMP
  Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables bnep bluetooth rfkill iptable_mangle ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi vfat fat dm_mirror dm_region_hash dm_log dm_mod cdc_ether iTCO_wdt i7core_edac coretemp usbnet iTCO_vendor_support mii crc32c_intel edac_core lpc_ich shpchp ioatdma mfd_core i2c_i801 pcspkr serio_raw bnx2 microcode dca vhost_net tun macvtap macvlan kvm_intel kvm uinput mgag200 sr_mod cdrom i2c_algo_bit sd_mod drm_kms_helper crc_t10dif ata_generic pata_acpi ttm ata_piix drm libata i2c_core megaraid_sas
  CPU 1
  Pid: 23217, comm: mmap10 Not tainted 3.8.0-rc1mainline+ torvalds#17 IBM IBM System x3400 M3 Server -[7379I08]-/69Y4356
  RIP: __split_huge_page+0x677/0x6d0
  RSP: 0000:ffff88017a03fc08  EFLAGS: 00010293
  RAX: 0000000000000003 RBX: ffff88027a6c22e0 RCX: 00000000000034d2
  RDX: 000000000000748b RSI: 0000000000000046 RDI: 0000000000000246
  RBP: ffff88017a03fcb8 R08: ffffffff819d2440 R09: 000000000000054a
  R10: 0000000000aaaaaa R11: 00000000ffffffff R12: 0000000000000000
  R13: 00007f4f11a00000 R14: ffff880179e96e00 R15: ffffea0005c08000
  FS:  00007f4f11f4a740(0000) GS:ffff88017bc20000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 00000037e9ebb404 CR3: 000000017a436000 CR4: 00000000000007e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Process mmap10 (pid: 23217, threadinfo ffff88017a03e000, task ffff880172dd32e0)
  Stack:
   ffff88017a540ec8 ffff88017a03fc20 ffffffff816017b5 ffff88017a03fc88
   ffffffff812fa014 0000000000000000 ffff880279ebd5c0 00000000f4f11a4c
   00000007f4f11f49 00000007f4f11a00 ffff88017a540ef0 ffff88017a540ee8
  Call Trace:
    split_huge_page+0x68/0xb0
    __split_huge_page_pmd+0x134/0x330
    split_huge_page_pmd_mm+0x51/0x60
    split_huge_page_address+0x3b/0x50
    __vma_adjust_trans_huge+0x9c/0xf0
    vma_adjust+0x684/0x750
    __split_vma.isra.28+0x1fa/0x220
    do_munmap+0xf9/0x420
    vm_munmap+0x4e/0x70
    sys_munmap+0x2b/0x40
    system_call_fastpath+0x16/0x1b

Alexander Beregalov and Alex Xu reported similar bugs and Hillf Danton
identified that commit 5a50508 ("mm/rmap: Convert the struct
anon_vma::mutex to an rwsem") and commit 4fc3f1d ("mm/rmap,
migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable")
were likely the problem.  Reverting these commits was reported to solve
the problem for Alexander.

Despite the reason for these commits, NUMA balancing is not the direct
source of the problem.  split_huge_page() expects the anon_vma lock to
be exclusive to serialise the whole split operation.  Ordinarily it is
expected that the anon_vma lock would only be required when updating the
avcs but THP also uses the anon_vma rwsem for collapse and split
operations where the page lock or compound lock cannot be used (as the
page is changing from base to THP or vice versa) and the page table
locks are insufficient.

This patch takes the anon_vma lock for write to serialise against parallel
split_huge_page as THP expected before the conversion to rwsem.

Reported-and-tested-by: Zhouping Liu <[email protected]>
Reported-by: Alexander Beregalov <[email protected]>
Reported-by: Alex Xu <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
tiwai pushed a commit that referenced this pull request Jan 27, 2013
mpc5121e doesn't have system interface registers, accessing this
register address space cause the machine check exception and a
kernel crash:

...
Machine check in kernel mode.
Caused by (from SRR1=49030): Transfer error ack signal
Oops: Machine check, sig: 7 [#1]
MPC5121 ADS
Modules linked in:
NIP: c025fd60 LR: c0265bb4 CTR: 00000000
REGS: df82dac0 TRAP: 0200   Not tainted
(3.7.0-rc7-00641-g81e6c91)
MSR: 00049030 <EE,ME,IR,DR>  CR: 42002024  XER: 20000000
TASK = df824b70[1] 'swapper' THREAD: df82c000
GPR00: 00000000 df82db70 df824b70 df3ed0f0 00000003 00000000 00000000 00000000
GPR08: 00000020 32000000 c03550ec 20000000 22002028 00000000 c0003f5c 00000000
GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 c0423898 c0450000
GPR24: 00000077 00000002 e5086180 1c000c00 e5086000 df33ec00 00000003 df34e000
NIP [c025fd60] ehci_fsl_setup_phy+0xd0/0x354
LR [c0265bb4] ehci_fsl_setup+0x220/0x284
...

Fix it by checking 'have_sysif_regs' flag before register access.

Signed-off-by: Anatolij Gustschin <[email protected]>
Acked-by: Alan Stern <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
tiwai pushed a commit that referenced this pull request Jan 27, 2013
ioat does DMA memory sync with DMA_TO_DEVICE direction on a buffer allocated
for DMA_FROM_DEVICE dma, resulting in the following warning from dma debug.
Fixed the dma_sync_single_for_device() call to use the correct direction.

[  226.288947] WARNING: at lib/dma-debug.c:990 check_sync+0x132/0x550()
[  226.288948] Hardware name: ProLiant DL380p Gen8
[  226.288951] ioatdma 0000:00:04.0: DMA-API: device driver syncs DMA memory with different direction [device address=0x00000000ffff7000] [size=4096 bytes] [mapped with DMA_FROM_DEVICE] [synced with DMA_TO_DEVICE]
[  226.288953] Modules linked in: iTCO_wdt(+) sb_edac(+) ioatdma(+) microcode serio_raw pcspkr edac_core hpwdt(+) iTCO_vendor_support hpilo(+) dca acpi_power_meter ata_generic pata_acpi sd_mod crc_t10dif ata_piix libata hpsa tg3 netxen_nic(+) sunrpc dm_mirror dm_region_hash dm_log dm_mod
[  226.288967] Pid: 1055, comm: work_for_cpu Tainted: G        W    3.3.0-0.20.el7.x86_64 #1
[  226.288968] Call Trace:
[  226.288974]  [<ffffffff810644cf>] warn_slowpath_common+0x7f/0xc0
[  226.288977]  [<ffffffff810645c6>] warn_slowpath_fmt+0x46/0x50
[  226.288980]  [<ffffffff81345502>] check_sync+0x132/0x550
[  226.288983]  [<ffffffff81345c9f>] debug_dma_sync_single_for_device+0x3f/0x50
[  226.288988]  [<ffffffff81661002>] ? wait_for_common+0x72/0x180
[  226.288995]  [<ffffffffa019590f>] ioat_xor_val_self_test+0x3e5/0x832 [ioatdma]
[  226.288999]  [<ffffffff811a5739>] ? kfree+0x259/0x270
[  226.289004]  [<ffffffffa0195d77>] ioat3_dma_self_test+0x1b/0x20 [ioatdma]
[  226.289008]  [<ffffffffa01952c3>] ioat_probe+0x2f8/0x348 [ioatdma]
[  226.289011]  [<ffffffffa0195f51>] ioat3_dma_probe+0x1d5/0x2aa [ioatdma]
[  226.289016]  [<ffffffffa0194d12>] ioat_pci_probe+0x139/0x17c [ioatdma]
[  226.289020]  [<ffffffff81354b8c>] local_pci_probe+0x5c/0xd0
[  226.289023]  [<ffffffff81083e50>] ? destroy_work_on_stack+0x20/0x20
[  226.289025]  [<ffffffff81083e68>] do_work_for_cpu+0x18/0x30
[  226.289029]  [<ffffffff8108d997>] kthread+0xb7/0xc0
[  226.289033]  [<ffffffff8166cef4>] kernel_thread_helper+0x4/0x10
[  226.289036]  [<ffffffff81662d20>] ? _raw_spin_unlock_irq+0x30/0x50
[  226.289038]  [<ffffffff81663234>] ? retint_restore_args+0x13/0x13
[  226.289041]  [<ffffffff8108d8e0>] ? kthread_worker_fn+0x1a0/0x1a0
[  226.289044]  [<ffffffff8166cef0>] ? gs_change+0x13/0x13
[  226.289045] ---[ end trace e1618afc7a606089 ]---
[  226.289047] Mapped at:
[  226.289048]  [<ffffffff81345307>] debug_dma_map_page+0x87/0x150
[  226.289050]  [<ffffffffa019653c>] dma_map_page.constprop.18+0x70/0xb34 [ioatdma]
[  226.289054]  [<ffffffffa0195702>] ioat_xor_val_self_test+0x1d8/0x832 [ioatdma]
[  226.289058]  [<ffffffffa0195d77>] ioat3_dma_self_test+0x1b/0x20 [ioatdma]
[  226.289061]  [<ffffffffa01952c3>] ioat_probe+0x2f8/0x348 [ioatdma]

Signed-off-by: Shuah Khan <[email protected]>
CC: <[email protected]>
Signed-off-by: Vinod Koul <[email protected]>
tiwai pushed a commit that referenced this pull request Jan 27, 2013
Commit 85ff6ac (xen/granttable: Grant
tables V2 implementation) changed the GREFS_PER_GRANT_FRAME macro from
a constant to a conditional expression. The expression depends on
grant_table_version being appropriately set. Unfortunately, at init
time grant_table_version will be 0. The GREFS_PER_GRANT_FRAME
conditional expression checks for "grant_table_version == 1", and
therefore returns the number of grant references per frame for v2.

This causes gnttab_init() to allocate fewer pages for gnttab_list, as
a frame can old half the number of v2 entries than v1 entries. After
gnttab_resume() is called, grant_table_version is appropriately
set. nr_init_grefs will then be miscalculated and gnttab_free_count
will hold a value larger than the actual number of free gref entries.

If a guest is heavily utilizing improperly initialized v1 grant
tables, memory corruption can occur. One common manifestation is
corruption of the vmalloc list, resulting in a poisoned pointer
derefrence when accessing /proc/meminfo or /proc/vmallocinfo:

[   40.770064] BUG: unable to handle kernel paging request at 0000200200001407
[   40.770083] IP: [<ffffffff811a6fb0>] get_vmalloc_info+0x70/0x110
[   40.770102] PGD 0
[   40.770107] Oops: 0000 [#1] SMP
[   40.770114] CPU 10

This patch introduces a static variable, grefs_per_grant_frame, to
cache the calculated value. gnttab_init() now calls
gnttab_request_version() early so that grant_table_version and
grefs_per_grant_frame can be appropriately set. A few BUG_ON()s have
been added to prevent this type of bug from reoccurring in the future.

Signed-off-by: Matt Wilson <[email protected]>
Reviewed-and-Tested-by: Steven Noonan <[email protected]>
Acked-by: Ian Campbell <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Annie Li <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected] # v3.3 and newer
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
tiwai pushed a commit that referenced this pull request Jan 27, 2013
The tty is NULL when the port is hanging up.
chase_port() needs to check for this.

This patch is intended for stable series.
The behavior was observed and tested in Linux 3.2 and 3.7.1.

Johan Hovold submitted a more elaborate patch for the mainline kernel.

[   56.277883] usb 1-1: edge_bulk_in_callback - nonzero read bulk status received: -84
[   56.278811] usb 1-1: USB disconnect, device number 3
[   56.278856] usb 1-1: edge_bulk_in_callback - stopping read!
[   56.279562] BUG: unable to handle kernel NULL pointer dereference at 00000000000001c8
[   56.280536] IP: [<ffffffff8144e62a>] _raw_spin_lock_irqsave+0x19/0x35
[   56.281212] PGD 1dc1b067 PUD 1e0f7067 PMD 0
[   56.282085] Oops: 0002 [#1] SMP
[   56.282744] Modules linked in:
[   56.283512] CPU 1
[   56.283512] Pid: 25, comm: khubd Not tainted 3.7.1 #1 innotek GmbH VirtualBox/VirtualBox
[   56.283512] RIP: 0010:[<ffffffff8144e62a>]  [<ffffffff8144e62a>] _raw_spin_lock_irqsave+0x19/0x35
[   56.283512] RSP: 0018:ffff88001fa99ab0  EFLAGS: 00010046
[   56.283512] RAX: 0000000000000046 RBX: 00000000000001c8 RCX: 0000000000640064
[   56.283512] RDX: 0000000000010000 RSI: ffff88001fa99b20 RDI: 00000000000001c8
[   56.283512] RBP: ffff88001fa99b20 R08: 0000000000000000 R09: 0000000000000000
[   56.283512] R10: 0000000000000000 R11: ffffffff812fcb4c R12: ffff88001ddf53c0
[   56.283512] R13: 0000000000000000 R14: 00000000000001c8 R15: ffff88001e19b9f4
[   56.283512] FS:  0000000000000000(0000) GS:ffff88001fd00000(0000) knlGS:0000000000000000
[   56.283512] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   56.283512] CR2: 00000000000001c8 CR3: 000000001dc51000 CR4: 00000000000006e0
[   56.283512] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   56.283512] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   56.283512] Process khubd (pid: 25, threadinfo ffff88001fa98000, task ffff88001fa94f80)
[   56.283512] Stack:
[   56.283512]  0000000000000046 00000000000001c8 ffffffff810578ec ffffffff812fcb4c
[   56.283512]  ffff88001e19b980 0000000000002710 ffffffff812ffe81 0000000000000001
[   56.283512]  ffff88001fa94f80 0000000000000202 ffffffff00000001 0000000000000296
[   56.283512] Call Trace:
[   56.283512]  [<ffffffff810578ec>] ? add_wait_queue+0x12/0x3c
[   56.283512]  [<ffffffff812fcb4c>] ? usb_serial_port_work+0x28/0x28
[   56.283512]  [<ffffffff812ffe81>] ? chase_port+0x84/0x2d6
[   56.283512]  [<ffffffff81063f27>] ? try_to_wake_up+0x199/0x199
[   56.283512]  [<ffffffff81263a5c>] ? tty_ldisc_hangup+0x222/0x298
[   56.283512]  [<ffffffff81300171>] ? edge_close+0x64/0x129
[   56.283512]  [<ffffffff810612f7>] ? __wake_up+0x35/0x46
[   56.283512]  [<ffffffff8106135b>] ? should_resched+0x5/0x23
[   56.283512]  [<ffffffff81264916>] ? tty_port_shutdown+0x39/0x44
[   56.283512]  [<ffffffff812fcb4c>] ? usb_serial_port_work+0x28/0x28
[   56.283512]  [<ffffffff8125d38c>] ? __tty_hangup+0x307/0x351
[   56.283512]  [<ffffffff812e6ddc>] ? usb_hcd_flush_endpoint+0xde/0xed
[   56.283512]  [<ffffffff8144e625>] ? _raw_spin_lock_irqsave+0x14/0x35
[   56.283512]  [<ffffffff812fd361>] ? usb_serial_disconnect+0x57/0xc2
[   56.283512]  [<ffffffff812ea99b>] ? usb_unbind_interface+0x5c/0x131
[   56.283512]  [<ffffffff8128d738>] ? __device_release_driver+0x7f/0xd5
[   56.283512]  [<ffffffff8128d9cd>] ? device_release_driver+0x1a/0x25
[   56.283512]  [<ffffffff8128d393>] ? bus_remove_device+0xd2/0xe7
[   56.283512]  [<ffffffff8128b7a3>] ? device_del+0x119/0x167
[   56.283512]  [<ffffffff812e8d9d>] ? usb_disable_device+0x6a/0x180
[   56.283512]  [<ffffffff812e2ae0>] ? usb_disconnect+0x81/0xe6
[   56.283512]  [<ffffffff812e4435>] ? hub_thread+0x577/0xe82
[   56.283512]  [<ffffffff8144daa7>] ? __schedule+0x490/0x4be
[   56.283512]  [<ffffffff8105798f>] ? abort_exclusive_wait+0x79/0x79
[   56.283512]  [<ffffffff812e3ebe>] ? usb_remote_wakeup+0x2f/0x2f
[   56.283512]  [<ffffffff812e3ebe>] ? usb_remote_wakeup+0x2f/0x2f
[   56.283512]  [<ffffffff810570b4>] ? kthread+0x81/0x89
[   56.283512]  [<ffffffff81057033>] ? __kthread_parkme+0x5c/0x5c
[   56.283512]  [<ffffffff8145387c>] ? ret_from_fork+0x7c/0xb0
[   56.283512]  [<ffffffff81057033>] ? __kthread_parkme+0x5c/0x5c
[   56.283512] Code: 8b 7c 24 08 e8 17 0b c3 ff 48 8b 04 24 48 83 c4 10 c3 53 48 89 fb 41 50 e8 e0 0a c3 ff 48 89 04 24 e8 e7 0a c3 ff ba 00 00 01 00
<f0> 0f c1 13 48 8b 04 24 89 d1 c1 ea 10 66 39 d1 74 07 f3 90 66
[   56.283512] RIP  [<ffffffff8144e62a>] _raw_spin_lock_irqsave+0x19/0x35
[   56.283512]  RSP <ffff88001fa99ab0>
[   56.283512] CR2: 00000000000001c8
[   56.283512] ---[ end trace 49714df27e1679ce ]---

Signed-off-by: Wolfgang Frisch <[email protected]>
Cc: Johan Hovold <[email protected]>
Cc: stable <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
tiwai pushed a commit that referenced this pull request Jan 27, 2013
The 'intel_idle_probe' probes the CPU and sets the CPU notifier.
But if later on during the module initialization we fail (say
in cpuidle_register_driver), we stop loading, but we neglect
to unregister the CPU notifier.  This means that during CPU
hotplug events the system will fail:

calling  intel_idle_init+0x0/0x326 @ 1
intel_idle: MWAIT substates: 0x1120
intel_idle: v0.4 model 0x2A
intel_idle: lapic_timer_reliable_states 0xffffffff
intel_idle: intel_idle yielding to none
initcall intel_idle_init+0x0/0x326 returned -19 after 14 usecs

... some time later, offlining and onlining a CPU:

cpu 3 spinlock event irq 62
BUG: unable to ] __cpuidle_register_device+0x1c/0x120
PGD 99b8b067 PUD 99b95067 PMD 0
Oops: 0000 [#1] SMP
Modules linked in: xen_evtchn nouveau mxm_wmi wmi radeon ttm i915 fbcon tileblit font atl1c bitblit softcursor drm_kms_helper video xen_blkfront xen_netfront fb_sys_fops sysimgblt sysfillrect syscopyarea xenfs xen_privcmd mperf
CPU 0
Pid: 2302, comm: udevd Not tainted 3.8.0-rc3upstream-00249-g09ad159 #1 MSI MS-7680/H61M-P23 (MS-7680)
RIP: e030:[<ffffffff814d956c>]  [<ffffffff814d956c>] __cpuidle_register_device+0x1c/0x120
RSP: e02b:ffff88009dacfcb8  EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff880105380000 RCX: 000000000000001c
RDX: 0000000000000000 RSI: 0000000000000055 RDI: ffff880105380000
RBP: ffff88009dacfce8 R08: ffffffff81a4f048 R09: 0000000000000008
R10: 0000000000000008 R11: 0000000000000000 R12: ffff880105380000
R13: 00000000ffffffdd R14: 0000000000000000 R15: ffffffff81a523d0
FS:  00007f37bd83b7a0(0000) GS:ffff880105200000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 00000000a09ea000 CR4: 0000000000042660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process udevd (pid: 2302, threadinfo ffff88009dace000, task ffff88009afb47f0)
Stack:
 ffffffff8107f2d0 ffffffff810c2fb7 ffff88009dacfce8 00000000ffffffea
 ffff880105380000 00000000ffffffdd ffff88009dacfd08 ffffffff814d9882
 0000000000000003 ffff880105380000 ffff88009dacfd28 ffffffff81340afd
Call Trace:
 [<ffffffff8107f2d0>] ? collect_cpu_info_local+0x30/0x30
 [<ffffffff810c2fb7>] ? __might_sleep+0xe7/0x100
 [<ffffffff814d9882>] cpuidle_register_device+0x32/0x70
 [<ffffffff81340afd>] intel_idle_cpu_init+0xad/0x110
 [<ffffffff81340bc8>] cpu_hotplug_notify+0x68/0x80
 [<ffffffff8166023d>] notifier_call_chain+0x4d/0x70
 [<ffffffff810bc369>] __raw_notifier_call_chain+0x9/0x10
 [<ffffffff81094a4b>] __cpu_notify+0x1b/0x30
 [<ffffffff81652cf7>] _cpu_up+0x103/0x14b
 [<ffffffff81652e18>] cpu_up+0xd9/0xec
 [<ffffffff8164a254>] store_online+0x94/0xd0
 [<ffffffff814122fb>] dev_attr_store+0x1b/0x20
 [<ffffffff81216404>] sysfs_write_file+0xf4/0x170
 [<ffffffff811a1024>] vfs_write+0xb4/0x130
 [<ffffffff811a17ea>] sys_write+0x5a/0xa0
 [<ffffffff816643a9>] system_call_fastpath+0x16/0x1b
Code: 03 18 00 c9 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec 30 48 89 5d e8 4c 89 65 f0 48 89 fb 4c 89 6d f8 e8 84 08 00 00 <48> 8b 78 08 49 89 c4 e8 f8 7f c1 ff 89 c2 b8 ea ff ff ff 84 d2
RIP  [<ffffffff814d956c>] __cpuidle_register_device+0x1c/0x120
 RSP <ffff88009dacfcb8>

This patch fixes that by moving the CPU notifier registration
as the last item to be done by the module.

Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
Reviewed-by: Srivatsa S. Bhat <[email protected]>
Cc: 3.6+ <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
tiwai pushed a commit that referenced this pull request Feb 5, 2025
In commit e4b5ccd ("drm/v3d: Ensure job pointer is set to NULL
after job completion"), we introduced a change to assign the job pointer
to NULL after completing a job, indicating job completion.

However, this approach created a race condition between the DRM
scheduler workqueue and the IRQ execution thread. As soon as the fence is
signaled in the IRQ execution thread, a new job starts to be executed.
This results in a race condition where the IRQ execution thread sets the
job pointer to NULL simultaneously as the `run_job()` function assigns
a new job to the pointer.

This race condition can lead to a NULL pointer dereference if the IRQ
execution thread sets the job pointer to NULL after `run_job()` assigns
it to the new job. When the new job completes and the GPU emits an
interrupt, `v3d_irq()` is triggered, potentially causing a crash.

[  466.310099] Unable to handle kernel NULL pointer dereference at virtual address 00000000000000c0
[  466.318928] Mem abort info:
[  466.321723]   ESR = 0x0000000096000005
[  466.325479]   EC = 0x25: DABT (current EL), IL = 32 bits
[  466.330807]   SET = 0, FnV = 0
[  466.333864]   EA = 0, S1PTW = 0
[  466.337010]   FSC = 0x05: level 1 translation fault
[  466.341900] Data abort info:
[  466.344783]   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
[  466.350285]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[  466.355350]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[  466.360677] user pgtable: 4k pages, 39-bit VAs, pgdp=0000000089772000
[  466.367140] [00000000000000c0] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
[  466.375875] Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
[  466.382163] Modules linked in: rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device algif_hash algif_skcipher af_alg bnep binfmt_misc vc4 snd_soc_hdmi_codec drm_display_helper cec brcmfmac_wcc spidev rpivid_hevc(C) drm_client_lib brcmfmac hci_uart drm_dma_helper pisp_be btbcm brcmutil snd_soc_core aes_ce_blk v4l2_mem2mem bluetooth aes_ce_cipher snd_compress videobuf2_dma_contig ghash_ce cfg80211 gf128mul snd_pcm_dmaengine videobuf2_memops ecdh_generic sha2_ce ecc videobuf2_v4l2 snd_pcm v3d sha256_arm64 rfkill videodev snd_timer sha1_ce libaes gpu_sched snd videobuf2_common sha1_generic drm_shmem_helper mc rp1_pio drm_kms_helper raspberrypi_hwmon spi_bcm2835 gpio_keys i2c_brcmstb rp1 raspberrypi_gpiomem rp1_mailbox rp1_adc nvmem_rmem uio_pdrv_genirq uio i2c_dev drm ledtrig_pattern drm_panel_orientation_quirks backlight fuse dm_mod ip_tables x_tables ipv6
[  466.458429] CPU: 0 UID: 1000 PID: 2008 Comm: chromium Tainted: G         C         6.13.0-v8+ torvalds#18
[  466.467336] Tainted: [C]=CRAP
[  466.470306] Hardware name: Raspberry Pi 5 Model B Rev 1.0 (DT)
[  466.476157] pstate: 404000c9 (nZcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  466.483143] pc : v3d_irq+0x118/0x2e0 [v3d]
[  466.487258] lr : __handle_irq_event_percpu+0x60/0x228
[  466.492327] sp : ffffffc080003ea0
[  466.495646] x29: ffffffc080003ea0 x28: ffffff80c0c94200 x27: 0000000000000000
[  466.502807] x26: ffffffd08dd81d7b x25: ffffff80c0c94200 x24: ffffff8003bdc200
[  466.509969] x23: 0000000000000001 x22: 00000000000000a7 x21: 0000000000000000
[  466.517130] x20: ffffff8041bb0000 x19: 0000000000000001 x18: 0000000000000000
[  466.524291] x17: ffffffafadfb0000 x16: ffffffc080000000 x15: 0000000000000000
[  466.531452] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
[  466.538613] x11: 0000000000000000 x10: 0000000000000000 x9 : ffffffd08c527eb0
[  466.545777] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
[  466.552941] x5 : ffffffd08c4100d0 x4 : ffffffafadfb0000 x3 : ffffffc080003f70
[  466.560102] x2 : ffffffc0829e8058 x1 : 0000000000000001 x0 : 0000000000000000
[  466.567263] Call trace:
[  466.569711]  v3d_irq+0x118/0x2e0 [v3d] (P)
[  466.573826]  __handle_irq_event_percpu+0x60/0x228
[  466.578546]  handle_irq_event+0x54/0xb8
[  466.582391]  handle_fasteoi_irq+0xac/0x240
[  466.586498]  generic_handle_domain_irq+0x34/0x58
[  466.591128]  gic_handle_irq+0x48/0xd8
[  466.594798]  call_on_irq_stack+0x24/0x58
[  466.598730]  do_interrupt_handler+0x88/0x98
[  466.602923]  el0_interrupt+0x44/0xc0
[  466.606508]  __el0_irq_handler_common+0x18/0x28
[  466.611050]  el0t_64_irq_handler+0x10/0x20
[  466.615156]  el0t_64_irq+0x198/0x1a0
[  466.618740] Code: 52800035 3607faf3 f9442e80 52800021 (f9406018)
[  466.624853] ---[ end trace 0000000000000000 ]---
[  466.629483] Kernel panic - not syncing: Oops: Fatal exception in interrupt
[  466.636384] SMP: stopping secondary CPUs
[  466.640320] Kernel Offset: 0x100c400000 from 0xffffffc080000000
[  466.646259] PHYS_OFFSET: 0x0
[  466.649141] CPU features: 0x100,00000170,00901250,0200720b
[  466.654644] Memory Limit: none
[  466.657706] ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---

Fix the crash by assigning the job pointer to NULL before signaling the
fence. This ensures that the job pointer is cleared before any new job
starts execution, preventing the race condition and the NULL pointer
dereference crash.

Cc: [email protected]
Fixes: e4b5ccd ("drm/v3d: Ensure job pointer is set to NULL after job completion")
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Jose Maria Casanova Crespo <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Tested-by: Phil Elwell <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
tiwai pushed a commit that referenced this pull request Feb 5, 2025
On some systems, the same CPU (with the same APIC ID) is assigned a
different logical CPU id after commit ec9aedb ("x86/acpi: Ignore
invalid x2APIC entries").

This means that Linux enumerates the CPUs in a different order, which
violates ACPI specification[1] that states:

  "OSPM should initialize processors in the order that they appear in
   the MADT"

The problematic commit parses all LAPIC entries before any x2APIC
entries, aiming to ignore x2APIC entries with APIC ID < 255 when valid
LAPIC entries exist. However, it disrupts the CPU enumeration order on
systems where x2APIC entries precede LAPIC entries in the MADT.

Fix this problem by:

 1) Parsing LAPIC entries first without registering them in the
    topology to evaluate whether valid LAPIC entries exist.

 2) Restoring the MADT in order parser which invokes either the LAPIC
    or the X2APIC parser function depending on the entry type.

The X2APIC parser still ignores entries < 0xff in case that #1 found
valid LAPIC entries independent of their position in the MADT table.

Link: https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html#madt-processor-local-apic-sapic-structure-entry-order
Cc: All applicable <[email protected]>
Reported-by: Jim Mattson <[email protected]>
Closes: https://lore.kernel.org/all/[email protected]/
Fixes: ec9aedb ("x86/acpi: Ignore invalid x2APIC entries")
Signed-off-by: Zhang Rui <[email protected]>
Reviewed-by: Jim Mattson <[email protected]>
Tested-by: Jim Mattson <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Rafael J. Wysocki <[email protected]>
tiwai pushed a commit that referenced this pull request Feb 5, 2025
Add read memory barrier to ensure the order of operations when accessing
control queue descriptors. Specifically, we want to avoid cases where loads
can be reordered:

1. Load #1 is dispatched to read descriptor flags.
2. Load #2 is dispatched to read some other field from the descriptor.
3. Load #2 completes, accessing memory/cache at a point in time when the DD
   flag is zero.
4. NIC DMA overwrites the descriptor, now the DD flag is one.
5. Any fields loaded before step 4 are now inconsistent with the actual
   descriptor state.

Add read memory barrier between steps 1 and 2, so that load #2 is not
executed until load #1 has completed.

Fixes: 8077c72 ("idpf: add controlq init and reset checks")
Reviewed-by: Przemek Kitszel <[email protected]>
Reviewed-by: Sridhar Samudrala <[email protected]>
Suggested-by: Lance Richardson <[email protected]>
Signed-off-by: Emil Tantilov <[email protected]>
Tested-by: Krishneil Singh <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
tiwai pushed a commit that referenced this pull request Feb 5, 2025
When testing the atomic write fix patches, the f2fs_bug_on was
triggered as below:

------------[ cut here ]------------
kernel BUG at fs/f2fs/inode.c:935!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP PTI
CPU: 3 UID: 0 PID: 257 Comm: bash Not tainted 6.13.0-rc1-00033-gc283a70d3497 #5
RIP: 0010:f2fs_evict_inode+0x50f/0x520
Call Trace:
 <TASK>
 ? __die_body+0x65/0xb0
 ? die+0x9f/0xc0
 ? do_trap+0xa1/0x170
 ? f2fs_evict_inode+0x50f/0x520
 ? f2fs_evict_inode+0x50f/0x520
 ? handle_invalid_op+0x65/0x80
 ? f2fs_evict_inode+0x50f/0x520
 ? exc_invalid_op+0x39/0x50
 ? asm_exc_invalid_op+0x1a/0x20
 ? __pfx_f2fs_get_dquots+0x10/0x10
 ? f2fs_evict_inode+0x50f/0x520
 ? f2fs_evict_inode+0x2e5/0x520
 evict+0x186/0x2f0
 prune_icache_sb+0x75/0xb0
 super_cache_scan+0x1a8/0x200
 do_shrink_slab+0x163/0x320
 shrink_slab+0x2fc/0x470
 drop_slab+0x82/0xf0
 drop_caches_sysctl_handler+0x4e/0xb0
 proc_sys_call_handler+0x183/0x280
 vfs_write+0x36d/0x450
 ksys_write+0x68/0xd0
 do_syscall_64+0xc8/0x1a0
 ? arch_exit_to_user_mode_prepare+0x11/0x60
 ? irqentry_exit_to_user_mode+0x7e/0xa0

The root cause is: f2fs uses FI_ATOMIC_DIRTIED to indicate dirty
atomic files during commit. If the inode is dirtied during commit,
such as by f2fs_i_pino_write, the vfs inode keeps clean and the
f2fs inode is set to FI_DIRTY_INODE. The FI_DIRTY_INODE flag cann't
be cleared by write_inode later due to the clean vfs inode. Finally,
f2fs_bug_on is triggered due to this inconsistent state when evict.

To reproduce this situation:
- fd = open("/mnt/test.db", O_WRONLY)
- ioctl(fd, F2FS_IOC_START_ATOMIC_WRITE)
- mv /mnt/test.db /mnt/test1.db
- ioctl(fd, F2FS_IOC_COMMIT_ATOMIC_WRITE)
- echo 3 > /proc/sys/vm/drop_caches

To fix this problem, clear FI_DIRTY_INODE after commit, then
f2fs_mark_inode_dirty_sync will ensure a consistent dirty state.

Fixes: fccaa81 ("f2fs: prevent atomic file from being dirtied before commit")
Signed-off-by: Yunlei He <[email protected]>
Signed-off-by: Jianan Huang <[email protected]>
Reviewed-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>
tiwai pushed a commit that referenced this pull request Feb 5, 2025
…g_info

syz reported:

(syz-executor404,5313,0):ocfs2_truncate_log_append:5874 ERROR: bug
expression: tl_count > ocfs2_truncate_recs_per_inode(osb->sb) ||
tl_count == 0
(syz-executor404,5313,0):ocfs2_truncate_log_append:5874 ERROR: Truncate
record count on torvalds#77 invalid wanted 39, actual 2087
------------[ cut here  ]------------
kernel BUG at fs/ocfs2/alloc.c:5874!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
CPU: 0 UID: 0 PID: 5313 Comm: syz-executor404 Not tainted
6.12.0-rc5-syzkaller-00299-g11066801dd4b #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:ocfs2_truncate_log_append+0x9a8/0x9c0 fs/ocfs2/alloc.c:5868
RSP: 0018:ffffc9000cf16f40 EFLAGS: 00010292
RAX: b4b54f1d10640800 RBX: 0000000000000027 RCX: b4b54f1d10640800
RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000
RBP: ffffc9000cf17070 R08: ffffffff8174a14c R09: 1ffff11003f8519a
R10: dffffc0000000000 R11: ffffed1003f8519b R12: 1ffff110085f5f58
R13: ffffff3800000000 R14: 000000000000004d R15: ffff8880438f0008
FS:  00005555722df380(0000) GS:ffff88801fc00000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000002000f000 CR3: 000000004010e000 CR4: 0000000000352ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
ocfs2_remove_btree_range+0x1303/0x1860 fs/ocfs2/alloc.c:5789
ocfs2_remove_inode_range+0xff3/0x29f0 fs/ocfs2/file.c:1907
ocfs2_reflink_remap_extent fs/ocfs2/refcounttree.c:4537 [inline]
ocfs2_reflink_remap_blocks+0xcd4/0x1f30 fs/ocfs2/refcounttree.c:4684
ocfs2_remap_file_range+0x5fa/0x8d0 fs/ocfs2/file.c:2736
vfs_copy_file_range+0xc07/0x1510 fs/read_write.c:1615
__do_sys_copy_file_range fs/read_write.c:1705 [inline]
__se_sys_copy_file_range+0x3f2/0x5d0 fs/read_write.c:1668
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fd327167af9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 61 17 00 00 90 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01
f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffe6b8e22e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000146
RAX: ffffffffffffffda RBX: 00007fd3271b005e RCX: 00007fd327167af9
RDX: 0000000000000006 RSI: 0000000000000000 RDI: 0000000000000004
RBP: 00007fd3271de610 R08: 000000000000d8c2 R09: 0000000000000000
R10: 0000000020000640 R11: 0000000000000246 R12: 0000000000000001
R13: 00007ffe6b8e24b8 R14: 0000000000000001 R15: 0000000000000001
</TASK>

The fuzz image has a truncate log inode whose tl_count is bigger than
ocfs2_truncate_recs_per_inode() so it triggers the BUG in
ocfs2_truncate_log_append().

As what the check in ocfs2_truncate_log_append() does, just do same check
into ocfs2_get_truncate_log_info when truncate log inode is reading in so
we can bail out earlier.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Su Yue <[email protected]>
Reported-by: Liebes Wang <[email protected]>
Link: https://lore.kernel.org/ocfs2-devel/CADCV8souQhdP0RdQF1U7KTWtuHDfpn+3LnTt-EEuMmB-pMRrgQ@mail.gmail.com/T/#u
Reported-by: [email protected]
Tested-by: [email protected]
Reviewed-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Jun Piao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
tiwai pushed a commit that referenced this pull request Feb 5, 2025
In "one-shot" mode, turbostat
1. takes a counter snapshot
2. forks and waits for a child
3. takes the end counter snapshot and prints the result.

But turbostat counter snapshots currently use affinity to travel
around the system so that counter reads are "local", and this
affinity must be cleared between #1 and #2 above.

The offending commit removed that reset that allowed the child
to run on cpu_present_set.

Fix that issue, and improve upon the original by using
cpu_possible_set for the child.  This allows the child
to also run on CPUs that hotplug online during its runtime.

Reported-by: Zhang Rui <[email protected]>
Fixes: 7bb3fe2 ("tools/power/turbostat: Obey allowed CPUs during startup")
Signed-off-by: Len Brown <[email protected]>
tiwai pushed a commit that referenced this pull request Feb 5, 2025
libtraceevent parses and returns an array of argument fields, sometimes
larger than RAW_SYSCALL_ARGS_NUM (6) because it includes "__syscall_nr",
idx will traverse to index 6 (7th element) whereas sc->fmt->arg holds 6
elements max, creating an out-of-bounds access. This runtime error is
found by UBsan. The error message:

  $ sudo UBSAN_OPTIONS=print_stacktrace=1 ./perf trace -a --max-events=1
  builtin-trace.c:1966:35: runtime error: index 6 out of bounds for type 'syscall_arg_fmt [6]'
    #0 0x5c04956be5fe in syscall__alloc_arg_fmts /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:1966
    #1 0x5c04956c0510 in trace__read_syscall_info /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:2110
    #2 0x5c04956c372b in trace__syscall_info /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:2436
    #3 0x5c04956d2f39 in trace__init_syscalls_bpf_prog_array_maps /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:3897
    #4 0x5c04956d6d25 in trace__run /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:4335
    #5 0x5c04956e112e in cmd_trace /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:5502
    #6 0x5c04956eda7d in run_builtin /home/howard/hw/linux-perf/tools/perf/perf.c:351
    #7 0x5c04956ee0a8 in handle_internal_command /home/howard/hw/linux-perf/tools/perf/perf.c:404
    torvalds#8 0x5c04956ee37f in run_argv /home/howard/hw/linux-perf/tools/perf/perf.c:448
    torvalds#9 0x5c04956ee8e9 in main /home/howard/hw/linux-perf/tools/perf/perf.c:556
    torvalds#10 0x79eb3622a3b7 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    torvalds#11 0x79eb3622a47a in __libc_start_main_impl ../csu/libc-start.c:360
    torvalds#12 0x5c04955422d4 in _start (/home/howard/hw/linux-perf/tools/perf/perf+0x4e02d4) (BuildId: 5b6cab2d59e96a4341741765ad6914a4d784dbc6)

     0.000 ( 0.014 ms): Chrome_ChildIO/117244 write(fd: 238, buf: !, count: 1)                                      = 1

Fixes: 5e58fcf ("perf trace: Allow allocating sc->arg_fmt even without the syscall tracepoint")
Signed-off-by: Howard Chu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>
tiwai pushed a commit that referenced this pull request Feb 5, 2025
This fixes the following hard lockup in isolate_lru_folios() during memory
reclaim.  If the LRU mostly contains ineligible folios this may trigger
watchdog.

watchdog: Watchdog detected hard LOCKUP on cpu 173
RIP: 0010:native_queued_spin_lock_slowpath+0x255/0x2a0
Call Trace:
	_raw_spin_lock_irqsave+0x31/0x40
	folio_lruvec_lock_irqsave+0x5f/0x90
	folio_batch_move_lru+0x91/0x150
	lru_add_drain_per_cpu+0x1c/0x40
	process_one_work+0x17d/0x350
	worker_thread+0x27b/0x3a0
	kthread+0xe8/0x120
	ret_from_fork+0x34/0x50
	ret_from_fork_asm+0x1b/0x30

lruvec->lru_lock owner:

PID: 2865     TASK: ffff888139214d40  CPU: 40   COMMAND: "kswapd0"
 #0 [fffffe0000945e60] crash_nmi_callback at ffffffffa567a555
 #1 [fffffe0000945e68] nmi_handle at ffffffffa563b171
 #2 [fffffe0000945eb0] default_do_nmi at ffffffffa6575920
 #3 [fffffe0000945ed0] exc_nmi at ffffffffa6575af4
 #4 [fffffe0000945ef0] end_repeat_nmi at ffffffffa6601dde
    [exception RIP: isolate_lru_folios+403]
    RIP: ffffffffa597df53  RSP: ffffc90006fb7c28  RFLAGS: 00000002
    RAX: 0000000000000001  RBX: ffffc90006fb7c60  RCX: ffffea04a2196f88
    RDX: ffffc90006fb7c60  RSI: ffffc90006fb7c60  RDI: ffffea04a2197048
    RBP: ffff88812cbd3010   R8: ffffea04a2197008   R9: 0000000000000001
    R10: 0000000000000000  R11: 0000000000000001  R12: ffffea04a2197008
    R13: ffffea04a2197048  R14: ffffc90006fb7de8  R15: 0000000003e3e937
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
    <NMI exception stack>
 #5 [ffffc90006fb7c28] isolate_lru_folios at ffffffffa597df53
 #6 [ffffc90006fb7cf8] shrink_active_list at ffffffffa597f788
 #7 [ffffc90006fb7da8] balance_pgdat at ffffffffa5986db0
 torvalds#8 [ffffc90006fb7ec0] kswapd at ffffffffa5987354
 torvalds#9 [ffffc90006fb7ef8] kthread at ffffffffa5748238
crash>

Scenario:
User processe are requesting a large amount of memory and keep page active.
Then a module continuously requests memory from ZONE_DMA32 area.
Memory reclaim will be triggered due to ZONE_DMA32 watermark alarm reached.
However pages in the LRU(active_anon) list are mostly from
the ZONE_NORMAL area.

Reproduce:
Terminal 1: Construct to continuously increase pages active(anon).
mkdir /tmp/memory
mount -t tmpfs -o size=1024000M tmpfs /tmp/memory
dd if=/dev/zero of=/tmp/memory/block bs=4M
tail /tmp/memory/block

Terminal 2:
vmstat -a 1
active will increase.
procs ---memory--- ---swap-- ---io---- -system-- ---cpu--- ...
 r  b   swpd   free  inact active   si   so    bi    bo
 1  0   0 1445623076 45898836 83646008    0    0     0
 1  0   0 1445623076 43450228 86094616    0    0     0
 1  0   0 1445623076 41003480 88541364    0    0     0
 1  0   0 1445623076 38557088 90987756    0    0     0
 1  0   0 1445623076 36109688 93435156    0    0     0
 1  0   0 1445619552 33663256 95881632    0    0     0
 1  0   0 1445619804 31217140 98327792    0    0     0
 1  0   0 1445619804 28769988 100774944    0    0     0
 1  0   0 1445619804 26322348 103222584    0    0     0
 1  0   0 1445619804 23875592 105669340    0    0     0

cat /proc/meminfo | head
Active(anon) increase.
MemTotal:       1579941036 kB
MemFree:        1445618500 kB
MemAvailable:   1453013224 kB
Buffers:            6516 kB
Cached:         128653956 kB
SwapCached:            0 kB
Active:         118110812 kB
Inactive:       11436620 kB
Active(anon):   115345744 kB
Inactive(anon):   945292 kB

When the Active(anon) is 115345744 kB, insmod module triggers
the ZONE_DMA32 watermark.

perf record -e vmscan:mm_vmscan_lru_isolate -aR
perf script
isolate_mode=0 classzone=1 order=1 nr_requested=32 nr_scanned=2
nr_skipped=2 nr_taken=0 lru=active_anon
isolate_mode=0 classzone=1 order=1 nr_requested=32 nr_scanned=0
nr_skipped=0 nr_taken=0 lru=active_anon
isolate_mode=0 classzone=1 order=0 nr_requested=32 nr_scanned=28835844
nr_skipped=28835844 nr_taken=0 lru=active_anon
isolate_mode=0 classzone=1 order=1 nr_requested=32 nr_scanned=28835844
nr_skipped=28835844 nr_taken=0 lru=active_anon
isolate_mode=0 classzone=1 order=0 nr_requested=32 nr_scanned=29
nr_skipped=29 nr_taken=0 lru=active_anon
isolate_mode=0 classzone=1 order=0 nr_requested=32 nr_scanned=0
nr_skipped=0 nr_taken=0 lru=active_anon

See nr_scanned=28835844.
28835844 * 4k = 115343376KB approximately equal to 115345744 kB.

If increase Active(anon) to 1000G then insmod module triggers
the ZONE_DMA32 watermark. hard lockup will occur.

In my device nr_scanned = 0000000003e3e937 when hard lockup.
Convert to memory size 0x0000000003e3e937 * 4KB = 261072092 KB.

   [ffffc90006fb7c28] isolate_lru_folios at ffffffffa597df53
    ffffc90006fb7c30: 0000000000000020 0000000000000000
    ffffc90006fb7c40: ffffc90006fb7d40 ffff88812cbd3000
    ffffc90006fb7c50: ffffc90006fb7d30 0000000106fb7de8
    ffffc90006fb7c60: ffffea04a2197008 ffffea0006ed4a48
    ffffc90006fb7c70: 0000000000000000 0000000000000000
    ffffc90006fb7c80: 0000000000000000 0000000000000000
    ffffc90006fb7c90: 0000000000000000 0000000000000000
    ffffc90006fb7ca0: 0000000000000000 0000000003e3e937
    ffffc90006fb7cb0: 0000000000000000 0000000000000000
    ffffc90006fb7cc0: 8d7c0b56b7874b00 ffff88812cbd3000

About the Fixes:
Why did it take eight years to be discovered?

The problem requires the following conditions to occur:
1. The device memory should be large enough.
2. Pages in the LRU(active_anon) list are mostly from the ZONE_NORMAL area.
3. The memory in ZONE_DMA32 needs to reach the watermark.

If the memory is not large enough, or if the usage design of ZONE_DMA32
area memory is reasonable, this problem is difficult to detect.

notes:
The problem is most likely to occur in ZONE_DMA32 and ZONE_NORMAL,
but other suitable scenarios may also trigger the problem.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: b2e1875 ("mm, vmscan: begin reclaiming pages on a per-node basis")
Signed-off-by: liuye <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Yang Shi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
tiwai pushed a commit that referenced this pull request Mar 12, 2025
Protected mode assumes that at minimum vgic-v3 is present, however KVM
fails to actually enforce this at the time of initialization. As such,
when running protected mode in a half-baked state on GICv2 hardware we
see the hyp go belly up at vcpu_load() when it tries to restore the
vgic-v3 cpuif:

  $ ./arch_timer_edge_cases
  [  130.599140] kvm [4518]: nVHE hyp panic at: [<ffff800081102b58>] __kvm_nvhe___vgic_v3_restore_vmcr_aprs+0x8/0x84!
  [  130.603685] kvm [4518]: Cannot dump pKVM nVHE stacktrace: !CONFIG_PROTECTED_NVHE_STACKTRACE
  [  130.611962] kvm [4518]: Hyp Offset: 0xfffeca95ed000000
  [  130.617053] Kernel panic - not syncing: HYP panic:
  [  130.617053] PS:800003c9 PC:0000b56a94102b58 ESR:0000000002000000
  [  130.617053] FAR:ffff00007b98d4d0 HPFAR:00000000007b98d0 PAR:0000000000000000
  [  130.617053] VCPU:0000000000000000
  [  130.638013] CPU: 0 UID: 0 PID: 4518 Comm: arch_timer_edge Tainted: G         C         6.13.0-rc3-00009-gf7d03fcbf1f4 #1
  [  130.648790] Tainted: [C]=CRAP
  [  130.651721] Hardware name: Libre Computer AML-S905X-CC (DT)
  [  130.657242] Call trace:
  [  130.659656]  show_stack+0x18/0x24 (C)
  [  130.663279]  dump_stack_lvl+0x38/0x90
  [  130.666900]  dump_stack+0x18/0x24
  [  130.670178]  panic+0x388/0x3e8
  [  130.673196]  nvhe_hyp_panic_handler+0x104/0x208
  [  130.677681]  kvm_arch_vcpu_load+0x290/0x548
  [  130.681821]  vcpu_load+0x50/0x80
  [  130.685013]  kvm_arch_vcpu_ioctl_run+0x30/0x868
  [  130.689498]  kvm_vcpu_ioctl+0x2e0/0x974
  [  130.693293]  __arm64_sys_ioctl+0xb4/0xec
  [  130.697174]  invoke_syscall+0x48/0x110
  [  130.700883]  el0_svc_common.constprop.0+0x40/0xe0
  [  130.705540]  do_el0_svc+0x1c/0x28
  [  130.708818]  el0_svc+0x30/0xd0
  [  130.711837]  el0t_64_sync_handler+0x10c/0x138
  [  130.716149]  el0t_64_sync+0x198/0x19c
  [  130.719774] SMP: stopping secondary CPUs
  [  130.723660] Kernel Offset: disabled
  [  130.727103] CPU features: 0x000,00000800,02800000,0200421b
  [  130.732537] Memory Limit: none
  [  130.735561] ---[ end Kernel panic - not syncing: HYP panic:
  [  130.735561] PS:800003c9 PC:0000b56a94102b58 ESR:0000000002000000
  [  130.735561] FAR:ffff00007b98d4d0 HPFAR:00000000007b98d0 PAR:0000000000000000
  [  130.735561] VCPU:0000000000000000 ]---

Fix it by failing KVM initialization if the system doesn't implement
vgic-v3, as protected mode will never do anything useful on such
hardware.

Reported-by: Mark Brown <[email protected]>
Closes: https://lore.kernel.org/kvmarm/[email protected]/
Signed-off-by: Oliver Upton <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Marc Zyngier <[email protected]>
tiwai pushed a commit that referenced this pull request Mar 12, 2025
…/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 6.14, take #1

- Correctly clean the BSS to the PoC before allowing EL2 to access it
  on nVHE/hVHE/protected configurations

- Propagate ownership of debug registers in protected mode after
  the rework that landed in 6.14-rc1

- Stop pretending that we can run the protected mode without a GICv3
  being present on the host

- Fix a use-after-free situation that can occur if a vcpu fails to
  initialise the NV shadow S2 MMU contexts

- Always evaluate the need to arm a background timer for fully emulated
  guest timers

- Fix the emulation of EL1 timers in the absence of FEAT_ECV

- Correctly handle the EL2 virtual timer, specially when HCR_EL2.E2H==0
tiwai pushed a commit that referenced this pull request Mar 12, 2025
commit c8347f9 ("gpu: host1x: Fix boot regression for Tegra")
caused a use of uninitialized mutex leading to below warning when
CONFIG_DEBUG_MUTEXES and CONFIG_DEBUG_LOCK_ALLOC are enabled.

[   41.662843] ------------[ cut here ]------------
[   41.663012] DEBUG_LOCKS_WARN_ON(lock->magic != lock)
[   41.663035] WARNING: CPU: 4 PID: 794 at kernel/locking/mutex.c:587 __mutex_lock+0x670/0x878
[   41.663458] Modules linked in: rtw88_8822c(+) bluetooth(+) rtw88_pci rtw88_core mac80211 aquantia libarc4 crc_itu_t cfg80211 tegra194_cpufreq dwmac_tegra(+) arm_dsu_pmu stmmac_platform stmmac pcs_xpcs rfkill at24 host1x(+) tegra_bpmp_thermal ramoops reed_solomon fuse loop nfnetlink xfs mmc_block rpmb_core ucsi_ccg ina3221 crct10dif_ce xhci_tegra ghash_ce lm90 sha2_ce sha256_arm64 sha1_ce sdhci_tegra pwm_fan sdhci_pltfm sdhci gpio_keys rtc_tegra cqhci mmc_core phy_tegra_xusb i2c_tegra tegra186_gpc_dma i2c_tegra_bpmp spi_tegra114 dm_mirror dm_region_hash dm_log dm_mod
[   41.665078] CPU: 4 UID: 0 PID: 794 Comm: (udev-worker) Not tainted 6.11.0-29.31_1538613708.el10.aarch64+debug #1
[   41.665838] Hardware name: NVIDIA NVIDIA Jetson AGX Orin Developer Kit/Jetson, BIOS 36.3.0-gcid-35594366 02/26/2024
[   41.672555] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   41.679636] pc : __mutex_lock+0x670/0x878
[   41.683834] lr : __mutex_lock+0x670/0x878
[   41.688035] sp : ffff800084b77090
[   41.691446] x29: ffff800084b77160 x28: ffffdd4bebf7b000 x27: ffffdd4be96b1000
[   41.698799] x26: 1fffe0002308361c x25: 1ffff0001096ee18 x24: 0000000000000000
[   41.706149] x23: 0000000000000000 x22: 0000000000000002 x21: ffffdd4be6e3c7a0
[   41.713500] x20: ffff800084b770f0 x19: ffff00011841b1e8 x18: 0000000000000000
[   41.720675] x17: 0000000000000000 x16: 0000000000000000 x15: 0720072007200720
[   41.728023] x14: 0000000000000000 x13: 0000000000000001 x12: ffff6001a96eaab3
[   41.735375] x11: 1fffe001a96eaab2 x10: ffff6001a96eaab2 x9 : ffffdd4be4838bbc
[   41.742723] x8 : 00009ffe5691554e x7 : ffff000d4b755593 x6 : 0000000000000001
[   41.749985] x5 : ffff000d4b755590 x4 : 1fffe0001d88f001 x3 : dfff800000000000
[   41.756988] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0000ec478000
[   41.764251] Call trace:
[   41.766695]  __mutex_lock+0x670/0x878
[   41.770373]  mutex_lock_nested+0x2c/0x40
[   41.774134]  host1x_intr_start+0x54/0xf8 [host1x]
[   41.778863]  host1x_runtime_resume+0x150/0x228 [host1x]
[   41.783935]  pm_generic_runtime_resume+0x84/0xc8
[   41.788485]  __rpm_callback+0xa0/0x478
[   41.792422]  rpm_callback+0x15c/0x1a8
[   41.795922]  rpm_resume+0x698/0xc08
[   41.799597]  __pm_runtime_resume+0xa8/0x140
[   41.803621]  host1x_probe+0x810/0xbc0 [host1x]
[   41.807909]  platform_probe+0xcc/0x1a8
[   41.811845]  really_probe+0x188/0x800
[   41.815347]  __driver_probe_device+0x164/0x360
[   41.819810]  driver_probe_device+0x64/0x1a8
[   41.823834]  __driver_attach+0x180/0x490
[   41.827773]  bus_for_each_dev+0x104/0x1a0
[   41.831797]  driver_attach+0x44/0x68
[   41.835296]  bus_add_driver+0x23c/0x4e8
[   41.839235]  driver_register+0x15c/0x3a8
[   41.843170]  __platform_register_drivers+0xa4/0x208
[   41.848159]  tegra_host1x_init+0x4c/0xff8 [host1x]
[   41.853147]  do_one_initcall+0xd4/0x380
[   41.856997]  do_init_module+0x1dc/0x698
[   41.860758]  load_module+0xc70/0x1300
[   41.864435]  __do_sys_init_module+0x1a8/0x1d0
[   41.868721]  __arm64_sys_init_module+0x74/0xb0
[   41.873183]  invoke_syscall.constprop.0+0xdc/0x1e8
[   41.877997]  do_el0_svc+0x154/0x1d0
[   41.881671]  el0_svc+0x54/0x140
[   41.884820]  el0t_64_sync_handler+0x120/0x130
[   41.889285]  el0t_64_sync+0x1a4/0x1a8
[   41.892960] irq event stamp: 69737
[   41.896370] hardirqs last  enabled at (69737): [<ffffdd4be6d7768c>] _raw_spin_unlock_irqrestore+0x44/0xe8
[   41.905739] hardirqs last disabled at (69736): [<ffffdd4be59dcd40>] clk_enable_lock+0x98/0x198
[   41.914314] softirqs last  enabled at (68082): [<ffffdd4be466b1d0>] handle_softirqs+0x4c8/0x890
[   41.922977] softirqs last disabled at (67945): [<ffffdd4be44f02a4>] __do_softirq+0x1c/0x28
[   41.931289] ---[ end trace 0000000000000000 ]---

Inside the probe function when pm_runtime_enable() is called,
the PM core invokes a resume callback if the device Host1x is
in a suspended state. As it can be seen in the logs above,
this leads to host1x_intr_start() function call which is
trying to acquire a mutex lock. But, the function
host_intr_init() only gets called after the pm_runtime_enable()
where mutex is initialised leading to the use of mutex
prior to its initialisation.

Fix this by moving the mutex initialisation prior to the runtime
PM enablement function pm_runtime_enable() in probe.

Fixes: c8347f9 ("gpu: host1x: Fix boot regression for Tegra")
Signed-off-by: Rupinderjit Singh <[email protected]>
Reviewed-by: Jon Hunter <[email protected]>
Tested-by: Jon Hunter <[email protected]>
Signed-off-by: Thierry Reding <[email protected]>
Link: https://patchwork.ozlabs.org/project/linux-tegra/patch/[email protected]/
tiwai pushed a commit that referenced this pull request Mar 12, 2025
vxlan_init() must check vxlan_vnigroup_init() success
otherwise a crash happens later, spotted by syzbot.

Oops: general protection fault, probably for non-canonical address 0xdffffc000000002c: 0000 [#1] PREEMPT SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0000000000000160-0x0000000000000167]
CPU: 0 UID: 0 PID: 7313 Comm: syz-executor147 Not tainted 6.14.0-rc1-syzkaller-00276-g69b54314c975 #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
 RIP: 0010:vxlan_vnigroup_uninit+0x89/0x500 drivers/net/vxlan/vxlan_vnifilter.c:912
Code: 00 48 8b 44 24 08 4c 8b b0 98 41 00 00 49 8d 86 60 01 00 00 48 89 c2 48 89 44 24 10 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 4d 04 00 00 49 8b 86 60 01 00 00 48 ba 00 00 00
RSP: 0018:ffffc9000cc1eea8 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: 0000000000000001 RCX: ffffffff8672effb
RDX: 000000000000002c RSI: ffffffff8672ecb9 RDI: ffff8880461b4f18
RBP: ffff8880461b4ef4 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000020000
R13: ffff8880461b0d80 R14: 0000000000000000 R15: dffffc0000000000
FS:  00007fecfa95d6c0(0000) GS:ffff88806a600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fecfa95cfb8 CR3: 000000004472c000 CR4: 0000000000352ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
  vxlan_uninit+0x1ab/0x200 drivers/net/vxlan/vxlan_core.c:2942
  unregister_netdevice_many_notify+0x12d6/0x1f30 net/core/dev.c:11824
  unregister_netdevice_many net/core/dev.c:11866 [inline]
  unregister_netdevice_queue+0x307/0x3f0 net/core/dev.c:11736
  register_netdevice+0x1829/0x1eb0 net/core/dev.c:10901
  __vxlan_dev_create+0x7c6/0xa30 drivers/net/vxlan/vxlan_core.c:3981
  vxlan_newlink+0xd1/0x130 drivers/net/vxlan/vxlan_core.c:4407
  rtnl_newlink_create net/core/rtnetlink.c:3795 [inline]
  __rtnl_newlink net/core/rtnetlink.c:3906 [inline]

Fixes: f9c4bb0 ("vxlan: vni filtering support on collect metadata device")
Reported-by: [email protected]
Closes: https://lore.kernel.org/netdev/[email protected]/T/#u
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Roopa Prabhu <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
tiwai pushed a commit that referenced this pull request Mar 12, 2025
bch2_nocow_write_convert_unwritten is already in transaction context:

00191 ========= TEST   generic/648
00242 kernel BUG at fs/bcachefs/btree_iter.c:3332!
00242 Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
00242 Modules linked in:
00242 CPU: 4 UID: 0 PID: 2593 Comm: fsstress Not tainted 6.13.0-rc3-ktest-g345af8f855b7 #14403
00242 Hardware name: linux,dummy-virt (DT)
00242 pstate: 60001005 (nZCv daif -PAN -UAO -TCO -DIT +SSBS BTYPE=--)
00242 pc : __bch2_trans_get+0x120/0x410
00242 lr : __bch2_trans_get+0xcc/0x410
00242 sp : ffffff80d89af600
00242 x29: ffffff80d89af600 x28: ffffff80ddb23000 x27: 00000000fffff705
00242 x26: ffffff80ddb23028 x25: ffffff80d8903fe0 x24: ffffff80ebb30168
00242 x23: ffffff80c8aeb500 x22: 000000000000005d x21: ffffff80d8904078
00242 x20: ffffff80d8900000 x19: ffffff80da9e8000 x18: 0000000000000000
00242 x17: 64747568735f6c61 x16: 6e72756f6a20726f x15: 0000000000000028
00242 x14: 0000000000000004 x13: 000000000000f787 x12: ffffffc081bbcdc8
00242 x11: 0000000000000000 x10: 0000000000000003 x9 : ffffffc08094efbc
00242 x8 : 000000001092c111 x7 : 000000000000000c x6 : ffffffc083c31fc4
00242 x5 : ffffffc083c31f28 x4 : ffffff80c8aeb500 x3 : ffffff80ebb30000
00242 x2 : 0000000000000001 x1 : 0000000000000a21 x0 : 000000000000028e
00242 Call trace:
00242  __bch2_trans_get+0x120/0x410 (P)
00242  bch2_inum_offset_err_msg+0x48/0xb0
00242  bch2_nocow_write_convert_unwritten+0x3d0/0x530
00242  bch2_nocow_write+0xeb0/0x1000
00242  __bch2_write+0x330/0x4e8
00242  bch2_write+0x1f0/0x530
00242  bch2_direct_write+0x530/0xc00
00242  bch2_write_iter+0x160/0xbe0
00242  vfs_write+0x1cc/0x360
00242  ksys_write+0x5c/0xf0
00242  __arm64_sys_write+0x20/0x30
00242  invoke_syscall.constprop.0+0x54/0xe8
00242  do_el0_svc+0x44/0xc0
00242  el0_svc+0x34/0xa0
00242  el0t_64_sync_handler+0x104/0x130
00242  el0t_64_sync+0x154/0x158
00242 Code: 6b01001f 54ffff01 79408460 3617fec0 (d4210000)
00242 ---[ end trace 0000000000000000 ]---
00242 Kernel panic - not syncing: Oops - BUG: Fatal exception

Signed-off-by: Alan Huang <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
tiwai pushed a commit that referenced this pull request Mar 12, 2025
We have several places across the kernel where we want to access another
task's syscall arguments, such as ptrace(2), seccomp(2), etc., by making
a call to syscall_get_arguments().

This works for register arguments right away by accessing the task's
`regs' member of `struct pt_regs', however for stack arguments seen with
32-bit/o32 kernels things are more complicated.  Technically they ought
to be obtained from the user stack with calls to an access_remote_vm(),
but we have an easier way available already.

So as to be able to access syscall stack arguments as regular function
arguments following the MIPS calling convention we copy them over from
the user stack to the kernel stack in arch/mips/kernel/scall32-o32.S, in
handle_sys(), to the current stack frame's outgoing argument space at
the top of the stack, which is where the handler called expects to see
its incoming arguments.  This area is also pointed at by the `pt_regs'
pointer obtained by task_pt_regs().

Make the o32 stack argument space a proper member of `struct pt_regs'
then, by renaming the existing member from `pad0' to `args' and using
generated offsets to access the space.  No functional change though.

With the change in place the o32 kernel stack frame layout at the entry
to a syscall handler invoked by handle_sys() is therefore as follows:

$sp + 68 -> |         ...         | <- pt_regs.regs[9]
            +---------------------+
$sp + 64 -> |         $t0         | <- pt_regs.regs[8]
            +---------------------+
$sp + 60 -> |   $a3/argument #4   | <- pt_regs.regs[7]
            +---------------------+
$sp + 56 -> |   $a2/argument #3   | <- pt_regs.regs[6]
            +---------------------+
$sp + 52 -> |   $a1/argument #2   | <- pt_regs.regs[5]
            +---------------------+
$sp + 48 -> |   $a0/argument #1   | <- pt_regs.regs[4]
            +---------------------+
$sp + 44 -> |         $v1         | <- pt_regs.regs[3]
            +---------------------+
$sp + 40 -> |         $v0         | <- pt_regs.regs[2]
            +---------------------+
$sp + 36 -> |         $at         | <- pt_regs.regs[1]
            +---------------------+
$sp + 32 -> |        $zero        | <- pt_regs.regs[0]
            +---------------------+
$sp + 28 -> |  stack argument torvalds#8  | <- pt_regs.args[7]
            +---------------------+
$sp + 24 -> |  stack argument #7  | <- pt_regs.args[6]
            +---------------------+
$sp + 20 -> |  stack argument #6  | <- pt_regs.args[5]
            +---------------------+
$sp + 16 -> |  stack argument #5  | <- pt_regs.args[4]
            +---------------------+
$sp + 12 -> | psABI space for $a3 | <- pt_regs.args[3]
            +---------------------+
$sp +  8 -> | psABI space for $a2 | <- pt_regs.args[2]
            +---------------------+
$sp +  4 -> | psABI space for $a1 | <- pt_regs.args[1]
            +---------------------+
$sp +  0 -> | psABI space for $a0 | <- pt_regs.args[0]
            +---------------------+

holding user data received and with the first 4 frame slots reserved by
the psABI for the compiler to spill the incoming arguments from $a0-$a3
registers (which it sometimes does according to its needs) and the next
4 frame slots designated by the psABI for any stack function arguments
that follow.  This data is also available for other tasks to peek/poke
at as reqired and where permitted.

Signed-off-by: Maciej W. Rozycki <[email protected]>
Signed-off-by: Thomas Bogendoerfer <[email protected]>
tiwai pushed a commit that referenced this pull request Mar 12, 2025
This makes ptrace/get_syscall_info selftest pass on mips o32 and
mips64 o32 by fixing the following two test assertions:

1. get_syscall_info test assertion on mips o32:
  # get_syscall_info.c:218:get_syscall_info:Expected exp_args[5] (3134521044) == info.entry.args[4] (4911432)
  # get_syscall_info.c:219:get_syscall_info:wait #1: entry stop mismatch

2. get_syscall_info test assertion on mips64 o32:
  # get_syscall_info.c:209:get_syscall_info:Expected exp_args[2] (3134324433) == info.entry.args[1] (18446744072548908753)
  # get_syscall_info.c:210:get_syscall_info:wait #1: entry stop mismatch

The first assertion happens due to mips_get_syscall_arg() trying to access
another task's context but failing to do it properly because get_user() it
calls just peeks at the current task's context.  It usually does not crash
because the default user stack always gets assigned the same VMA, but it
is pure luck which mips_get_syscall_arg() wouldn't have if e.g. the stack
was switched (via setcontext(3) or however) or a non-default process's
thread peeked at, and in any case irrelevant data is obtained just as
observed with the test case.

mips_get_syscall_arg() ought to be using access_remote_vm() instead to
retrieve the other task's stack contents, but given that the data has been
already obtained and saved in `struct pt_regs' it would be an overkill.

The first assertion is fixed for mips o32 by using struct pt_regs.args
instead of get_user() to obtain syscall arguments.  This approach works
due to this piece in arch/mips/kernel/scall32-o32.S:

        /*
         * Ok, copy the args from the luser stack to the kernel stack.
         */

        .set    push
        .set    noreorder
        .set    nomacro

    load_a4: user_lw(t5, 16(t0))		# argument #5 from usp
    load_a5: user_lw(t6, 20(t0))		# argument #6 from usp
    load_a6: user_lw(t7, 24(t0))		# argument #7 from usp
    load_a7: user_lw(t8, 28(t0))		# argument torvalds#8 from usp
    loads_done:

        sw	t5, PT_ARG4(sp)		# argument #5 to ksp
        sw	t6, PT_ARG5(sp)		# argument #6 to ksp
        sw	t7, PT_ARG6(sp)		# argument #7 to ksp
        sw	t8, PT_ARG7(sp)		# argument torvalds#8 to ksp
        .set	pop

        .section __ex_table,"a"
        PTR_WD	load_a4, bad_stack_a4
        PTR_WD	load_a5, bad_stack_a5
        PTR_WD	load_a6, bad_stack_a6
        PTR_WD	load_a7, bad_stack_a7
        .previous

arch/mips/kernel/scall64-o32.S has analogous code for mips64 o32 that
allows fixing the issue by obtaining syscall arguments from struct
pt_regs.regs[4..11] instead of the erroneous use of get_user().

The second assertion is fixed by truncating 64-bit values to 32-bit
syscall arguments.

Fixes: c0ff3c5 ("MIPS: Enable HAVE_ARCH_TRACEHOOK.")
Signed-off-by: Dmitry V. Levin <[email protected]>
Signed-off-by: Thomas Bogendoerfer <[email protected]>
tiwai pushed a commit that referenced this pull request Mar 12, 2025
This fixes the following trace by reworking the locking of l2cap_conn
so instead of only locking when changing the chan_l list this promotes
chan_lock to a general lock of l2cap_conn so whenever it is being held
it would prevents the likes of l2cap_conn_del to run:

list_del corruption, ffff888021297e00->prev is LIST_POISON2 (dead000000000122)
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:61!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
CPU: 1 UID: 0 PID: 5896 Comm: syz-executor213 Not tainted 6.14.0-rc1-next-20250204-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 12/27/2024
RIP: 0010:__list_del_entry_valid_or_report+0x12c/0x190 lib/list_debug.c:59
Code: 8c 4c 89 fe 48 89 da e8 32 8c 37 fc 90 0f 0b 48 89 df e8 27 9f 14 fd 48 c7 c7 a0 c0 60 8c 4c 89 fe 48 89 da e8 15 8c 37 fc 90 <0f> 0b 4c 89 e7 e8 0a 9f 14 fd 42 80 3c 2b 00 74 08 4c 89 e7 e8 cb
RSP: 0018:ffffc90003f6f998 EFLAGS: 00010246
RAX: 000000000000004e RBX: dead000000000122 RCX: 01454d423f7fbf00
RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000
RBP: dffffc0000000000 R08: ffffffff819f077c R09: 1ffff920007eded0
R10: dffffc0000000000 R11: fffff520007eded1 R12: dead000000000122
R13: dffffc0000000000 R14: ffff8880352248d8 R15: ffff888021297e00
FS:  00007f7ace6686c0(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f7aceeeb1d0 CR3: 000000003527c000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 __list_del_entry_valid include/linux/list.h:124 [inline]
 __list_del_entry include/linux/list.h:215 [inline]
 list_del_rcu include/linux/rculist.h:168 [inline]
 hci_chan_del+0x70/0x1b0 net/bluetooth/hci_conn.c:2858
 l2cap_conn_free net/bluetooth/l2cap_core.c:1816 [inline]
 kref_put include/linux/kref.h:65 [inline]
 l2cap_conn_put+0x70/0xe0 net/bluetooth/l2cap_core.c:1830
 l2cap_sock_shutdown+0xa8a/0x1020 net/bluetooth/l2cap_sock.c:1377
 l2cap_sock_release+0x79/0x1d0 net/bluetooth/l2cap_sock.c:1416
 __sock_release net/socket.c:642 [inline]
 sock_close+0xbc/0x240 net/socket.c:1393
 __fput+0x3e9/0x9f0 fs/file_table.c:448
 task_work_run+0x24f/0x310 kernel/task_work.c:227
 ptrace_notify+0x2d2/0x380 kernel/signal.c:2522
 ptrace_report_syscall include/linux/ptrace.h:415 [inline]
 ptrace_report_syscall_exit include/linux/ptrace.h:477 [inline]
 syscall_exit_work+0xc7/0x1d0 kernel/entry/common.c:173
 syscall_exit_to_user_mode_prepare kernel/entry/common.c:200 [inline]
 __syscall_exit_to_user_mode_work kernel/entry/common.c:205 [inline]
 syscall_exit_to_user_mode+0x24a/0x340 kernel/entry/common.c:218
 do_syscall_64+0x100/0x230 arch/x86/entry/common.c:89
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f7aceeaf449
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 41 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f7ace668218 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
RAX: fffffffffffffffc RBX: 00007f7acef39328 RCX: 00007f7aceeaf449
RDX: 000000000000000e RSI: 0000000020000100 RDI: 0000000000000004
RBP: 00007f7acef39320 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
R13: 0000000000000004 R14: 00007f7ace668670 R15: 000000000000000b
 </TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:__list_del_entry_valid_or_report+0x12c/0x190 lib/list_debug.c:59
Code: 8c 4c 89 fe 48 89 da e8 32 8c 37 fc 90 0f 0b 48 89 df e8 27 9f 14 fd 48 c7 c7 a0 c0 60 8c 4c 89 fe 48 89 da e8 15 8c 37 fc 90 <0f> 0b 4c 89 e7 e8 0a 9f 14 fd 42 80 3c 2b 00 74 08 4c 89 e7 e8 cb
RSP: 0018:ffffc90003f6f998 EFLAGS: 00010246
RAX: 000000000000004e RBX: dead000000000122 RCX: 01454d423f7fbf00
RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000
RBP: dffffc0000000000 R08: ffffffff819f077c R09: 1ffff920007eded0
R10: dffffc0000000000 R11: fffff520007eded1 R12: dead000000000122
R13: dffffc0000000000 R14: ffff8880352248d8 R15: ffff888021297e00
FS:  00007f7ace6686c0(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f7acef05b08 CR3: 000000003527c000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Reported-by: [email protected]
Tested-by: [email protected]
Fixes: b4f82f9 ("Bluetooth: L2CAP: Fix slab-use-after-free Read in l2cap_send_cmd")
Signed-off-by: Luiz Augusto von Dentz <[email protected]>
Signed-off-by: Dan Carpenter <[email protected]>
tiwai pushed a commit that referenced this pull request Mar 12, 2025
The behaviour of kthread_create_worker() was recently changed to align
with the one of kthread_create(). The kthread worker is created but not
awaken by default. This is to allow the use of kthread_affine_preferred()
and kthread_bind[_mask]() with kthread workers. In order to keep the
old behaviour and wake the kthread up, kthread_run_worker() must be
used. All the pre-existing users have been converted, except for UVC
that was introduced in the same merge window as the API change.

This results in hangs:

	INFO: task UVCG:82 blocked for more than 491 seconds.
	Tainted: G                T  6.13.0-rc2-00014-gb04e317b5226 #1
	task:UVCG            state:D stack:0     pid:82
	 Call Trace:
	 __schedule
	 schedule
	 schedule_preempt_disabled
	 kthread
	 ? kthread_flush_work
	 ret_from_fork
	 ret_from_fork_asm
	 entry_INT80_32

Fix this with converting UVCG kworker to the new API.

Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-lkp/[email protected]
Fixes: f0bbfbd ("usb: gadget: uvc: rework to enqueue in pump worker from encoded queue")
Cc: stable <[email protected]>
Cc: Michael Grzeschik <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
tiwai pushed a commit that referenced this pull request Mar 12, 2025
tcf_exts_miss_cookie_base_alloc() calls xa_alloc_cyclic() which can
return 1 if the allocation succeeded after wrapping. This was treated as
an error, with value 1 returned to caller tcf_exts_init_ex() which sets
exts->actions to NULL and returns 1 to caller fl_change().

fl_change() treats err == 1 as success, calling tcf_exts_validate_ex()
which calls tcf_action_init() with exts->actions as argument, where it
is dereferenced.

Example trace:

BUG: kernel NULL pointer dereference, address: 0000000000000000
CPU: 114 PID: 16151 Comm: handler114 Kdump: loaded Not tainted 5.14.0-503.16.1.el9_5.x86_64 #1
RIP: 0010:tcf_action_init+0x1f8/0x2c0
Call Trace:
 tcf_action_init+0x1f8/0x2c0
 tcf_exts_validate_ex+0x175/0x190
 fl_change+0x537/0x1120 [cls_flower]

Fixes: 80cd22c ("net/sched: cls_api: Support hardware miss to tc action")
Signed-off-by: Pierre Riteau <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
tiwai pushed a commit that referenced this pull request Mar 12, 2025
When trying to mmap a trace instance buffer that is attached to
reserve_mem, it would crash:

 BUG: unable to handle page fault for address: ffffe97bd00025c8
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 2862f3067 P4D 2862f3067 PUD 0
 Oops: Oops: 0000 [#1] PREEMPT_RT SMP PTI
 CPU: 4 UID: 0 PID: 981 Comm: mmap-rb Not tainted 6.14.0-rc2-test-00003-g7f1a5e3fbf9e-dirty torvalds#233
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
 RIP: 0010:validate_page_before_insert+0x5/0xb0
 Code: e2 01 89 d0 c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 <48> 8b 46 08 a8 01 75 67 66 90 48 89 f0 8b 50 34 85 d2 74 76 48 89
 RSP: 0018:ffffb148c2f3f968 EFLAGS: 00010246
 RAX: ffff9fa5d3322000 RBX: ffff9fa5ccff9c08 RCX: 00000000b879ed29
 RDX: ffffe97bd00025c0 RSI: ffffe97bd00025c0 RDI: ffff9fa5ccff9c08
 RBP: ffffb148c2f3f9f0 R08: 0000000000000004 R09: 0000000000000004
 R10: 0000000000000000 R11: 0000000000000200 R12: 0000000000000000
 R13: 00007f16a18d5000 R14: ffff9fa5c48db6a8 R15: 0000000000000000
 FS:  00007f16a1b54740(0000) GS:ffff9fa73df00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: ffffe97bd00025c8 CR3: 00000001048c6006 CR4: 0000000000172ef0
 Call Trace:
  <TASK>
  ? __die_body.cold+0x19/0x1f
  ? __die+0x2e/0x40
  ? page_fault_oops+0x157/0x2b0
  ? search_module_extables+0x53/0x80
  ? validate_page_before_insert+0x5/0xb0
  ? kernelmode_fixup_or_oops.isra.0+0x5f/0x70
  ? __bad_area_nosemaphore+0x16e/0x1b0
  ? bad_area_nosemaphore+0x16/0x20
  ? do_kern_addr_fault+0x77/0x90
  ? exc_page_fault+0x22b/0x230
  ? asm_exc_page_fault+0x2b/0x30
  ? validate_page_before_insert+0x5/0xb0
  ? vm_insert_pages+0x151/0x400
  __rb_map_vma+0x21f/0x3f0
  ring_buffer_map+0x21b/0x2f0
  tracing_buffers_mmap+0x70/0xd0
  __mmap_region+0x6f0/0xbd0
  mmap_region+0x7f/0x130
  do_mmap+0x475/0x610
  vm_mmap_pgoff+0xf2/0x1d0
  ksys_mmap_pgoff+0x166/0x200
  __x64_sys_mmap+0x37/0x50
  x64_sys_call+0x1670/0x1d70
  do_syscall_64+0xbb/0x1d0
  entry_SYSCALL_64_after_hwframe+0x77/0x7f

The reason was that the code that maps the ring buffer pages to user space
has:

	page = virt_to_page((void *)cpu_buffer->subbuf_ids[s]);

And uses that in:

	vm_insert_pages(vma, vma->vm_start, pages, &nr_pages);

But virt_to_page() does not work with vmap()'d memory which is what the
persistent ring buffer has. It is rather trivial to allow this, but for
now just disable mmap() of instances that have their ring buffer from the
reserve_mem option.

If an mmap() is performed on a persistent buffer it will return -ENODEV
just like it would if the .mmap field wasn't defined in the
file_operations structure.

Cc: [email protected]
Cc: Masami Hiramatsu <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Vincent Donnefort <[email protected]>
Link: https://lore.kernel.org/[email protected]
Fixes: 9b7bdf6 ("tracing: Have trace_printk not use binary prints if boot buffer")
Signed-off-by: Steven Rostedt (Google) <[email protected]>
tiwai pushed a commit that referenced this pull request Mar 12, 2025
In Inline mode, the journal is unused, and journal_sectors is zero.

Calculating the journal watermark requires dividing by journal_sectors,
which should be done only if the journal is configured.

Otherwise, a simple table query (dmsetup table) can cause OOPS.

This bug did not show on some systems, perhaps only due to
compiler optimization.

On my 32-bit testing machine, this reliably crashes with the following:

 : Oops: divide error: 0000 [#1] PREEMPT SMP
 : CPU: 0 UID: 0 PID: 2450 Comm: dmsetup Not tainted 6.14.0-rc2+ torvalds#959
 : EIP: dm_integrity_status+0x2f8/0xab0 [dm_integrity]
 ...

Signed-off-by: Milan Broz <[email protected]>
Signed-off-by: Mikulas Patocka <[email protected]>
Fixes: fb09876 ("dm-integrity: introduce the Inline mode")
Cc: [email protected] # 6.11+
tiwai pushed a commit that referenced this pull request Mar 12, 2025
Syzkaller reports the following bug:

BUG: spinlock bad magic on CPU#1, syz-executor.0/7995
 lock: 0xffff88805303f3e0, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
CPU: 1 PID: 7995 Comm: syz-executor.0 Tainted: G            E     5.10.209+ #1
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x119/0x179 lib/dump_stack.c:118
 debug_spin_lock_before kernel/locking/spinlock_debug.c:83 [inline]
 do_raw_spin_lock+0x1f6/0x270 kernel/locking/spinlock_debug.c:112
 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:117 [inline]
 _raw_spin_lock_irqsave+0x50/0x70 kernel/locking/spinlock.c:159
 reset_per_cpu_data+0xe6/0x240 [drop_monitor]
 net_dm_cmd_trace+0x43d/0x17a0 [drop_monitor]
 genl_family_rcv_msg_doit+0x22f/0x330 net/netlink/genetlink.c:739
 genl_family_rcv_msg net/netlink/genetlink.c:783 [inline]
 genl_rcv_msg+0x341/0x5a0 net/netlink/genetlink.c:800
 netlink_rcv_skb+0x14d/0x440 net/netlink/af_netlink.c:2497
 genl_rcv+0x29/0x40 net/netlink/genetlink.c:811
 netlink_unicast_kernel net/netlink/af_netlink.c:1322 [inline]
 netlink_unicast+0x54b/0x800 net/netlink/af_netlink.c:1348
 netlink_sendmsg+0x914/0xe00 net/netlink/af_netlink.c:1916
 sock_sendmsg_nosec net/socket.c:651 [inline]
 __sock_sendmsg+0x157/0x190 net/socket.c:663
 ____sys_sendmsg+0x712/0x870 net/socket.c:2378
 ___sys_sendmsg+0xf8/0x170 net/socket.c:2432
 __sys_sendmsg+0xea/0x1b0 net/socket.c:2461
 do_syscall_64+0x30/0x40 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x62/0xc7
RIP: 0033:0x7f3f9815aee9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f3f972bf0c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f3f9826d050 RCX: 00007f3f9815aee9
RDX: 0000000020000000 RSI: 0000000020001300 RDI: 0000000000000007
RBP: 00007f3f981b63bd R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000006e R14: 00007f3f9826d050 R15: 00007ffe01ee6768

If drop_monitor is built as a kernel module, syzkaller may have time
to send a netlink NET_DM_CMD_START message during the module loading.
This will call the net_dm_monitor_start() function that uses
a spinlock that has not yet been initialized.

To fix this, let's place resource initialization above the registration
of a generic netlink family.

Found by InfoTeCS on behalf of Linux Verification Center
(linuxtesting.org) with Syzkaller.

Fixes: 9a8afc8 ("Network Drop Monitor: Adding drop monitor implementation & Netlink protocol")
Cc: [email protected]
Signed-off-by: Ilia Gavrilov <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
tiwai pushed a commit that referenced this pull request Mar 12, 2025
A softlockup issue was found with stress test:
 watchdog: BUG: soft lockup - CPU#27 stuck for 26s! [migration/27:181]
 CPU: 27 UID: 0 PID: 181 Comm: migration/27 6.14.0-rc2-next-20250210 #1
 Stopper: multi_cpu_stop <- stop_machine_from_inactive_cpu
 RIP: 0010:stop_machine_yield+0x2/0x10
 RSP: 0000:ff4a0dcecd19be48 EFLAGS: 00000246
 RAX: ffffffff89c0108f RBX: ff4a0dcec03afe44 RCX: 0000000000000000
 RDX: ff1cdaaf6eba5808 RSI: 0000000000000282 RDI: ff1cda80c1775a40
 RBP: 0000000000000001 R08: 00000011620096c6 R09: 7fffffffffffffff
 R10: 0000000000000001 R11: 0000000000000100 R12: ff1cda80c1775a40
 R13: 0000000000000000 R14: 0000000000000001 R15: ff4a0dcec03afe20
 FS:  0000000000000000(0000) GS:ff1cdaaf6eb80000(0000)
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 00000025e2c2a001 CR4: 0000000000773ef0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 PKRU: 55555554
 Call Trace:
  multi_cpu_stop+0x8f/0x100
  cpu_stopper_thread+0x90/0x140
  smpboot_thread_fn+0xad/0x150
  kthread+0xc2/0x100
  ret_from_fork+0x2d/0x50

The stress test involves CPU hotplug operations and memory control group
(memcg) operations. The scenario can be described as follows:

 echo xx > memory.max 	cache_ap_online			oom_reaper
 (CPU23)						(CPU50)
 xx < usage		stop_machine_from_inactive_cpu
 for(;;)			// all active cpus
 trigger OOM		queue_stop_cpus_work
 // waiting oom_reaper
 			multi_cpu_stop(migration/xx)
 			// sync all active cpus ack
 			// waiting cpu23 ack
 			// CPU50 loops in multi_cpu_stop
 							waiting cpu50

Detailed explanation:
1. When the usage is larger than xx, an OOM may be triggered. If the
   process does not handle with ths kill signal immediately, it will loop
   in the memory_max_write.
2. When cache_ap_online is triggered, the multi_cpu_stop is queued to the
   active cpus. Within the multi_cpu_stop function,  it attempts to
   synchronize the CPU states. However, the CPU23 didn't acknowledge
   because it is stuck in a loop within the for(;;).
3. The oom_reaper process is blocked because CPU50 is in a loop, waiting
   for CPU23 to acknowledge the synchronization request.
4. Finally, it formed cyclic dependency and lead to softlockup and dead
   loop.

To fix this issue, add cond_resched() in the memory_max_write, so that it
will not block migration task.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: b6e6edc ("mm: memcontrol: reclaim and OOM kill when shrinking memory.max below usage")
Signed-off-by: Chen Ridong <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Wang Weiyang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
tiwai pushed a commit that referenced this pull request Mar 12, 2025
The namespace percpu counter protects pending I/O, and we can
only safely diable the namespace once the counter drop to zero.
Otherwise we end up with a crash when running blktests/nvme/058
(eg for loop transport):

[ 2352.930426] [  T53909] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN PTI
[ 2352.930431] [  T53909] KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
[ 2352.930434] [  T53909] CPU: 3 UID: 0 PID: 53909 Comm: kworker/u16:5 Tainted: G        W          6.13.0-rc6 torvalds#232
[ 2352.930438] [  T53909] Tainted: [W]=WARN
[ 2352.930440] [  T53909] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014
[ 2352.930443] [  T53909] Workqueue: nvmet-wq nvme_loop_execute_work [nvme_loop]
[ 2352.930449] [  T53909] RIP: 0010:blkcg_set_ioprio+0x44/0x180

as the queue is already torn down when calling submit_bio();

So we need to init the percpu counter in nvmet_ns_enable(), and
wait for it to drop to zero in nvmet_ns_disable() to avoid having
I/O pending after the namespace has been disabled.

Fixes: 74d1696 ("nvmet-loop: avoid using mutex in IO hotpath")

Signed-off-by: Hannes Reinecke <[email protected]>
Reviewed-by: Nilay Shroff <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Tested-by: Shin'ichiro Kawasaki <[email protected]>
Signed-off-by: Keith Busch <[email protected]>
tiwai pushed a commit that referenced this pull request Mar 12, 2025
The delayed work item function nvmet_pci_epf_poll_sqs_work() polls all
submission queues and keeps running in a loop as long as commands are
being submitted by the host. Depending on the preemption configuration
of the kernel, under heavy command workload, this function can thus run
for more than RCU_CPU_STALL_TIMEOUT seconds, leading to a RCU stall:

 rcu: INFO: rcu_sched self-detected stall on CPU
 rcu:   5-....: (20998 ticks this GP) idle=4244/1/0x4000000000000000 softirq=301/301 fqs=5132
 rcu:   (t=21000 jiffies g=-443 q=12 ncpus=8)
 CPU: 5 UID: 0 PID: 82 Comm: kworker/5:1 Not tainted 6.14.0-rc2 #1
 Hardware name: Radxa ROCK 5B (DT)
 Workqueue: events nvmet_pci_epf_poll_sqs_work [nvmet_pci_epf]
 pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 pc : dw_edma_device_tx_status+0xb8/0x130
 lr : dw_edma_device_tx_status+0x9c/0x130
 sp : ffff800080b5bbb0
 x29: ffff800080b5bbb0 x28: ffff0331c5c78400 x27: ffff0331c1cd1960
 x26: ffff0331c0e39010 x25: ffff0331c20e4000 x24: ffff0331c20e4a90
 x23: 0000000000000000 x22: 0000000000000001 x21: 00000000005aca33
 x20: ffff800080b5bc30 x19: ffff0331c123e370 x18: 000000000ab29e62
 x17: ffffb2a878c9c118 x16: ffff0335bde82040 x15: 0000000000000000
 x14: 000000000000017b x13: 00000000ee601780 x12: 0000000000000018
 x11: 0000000000000000 x10: 0000000000000001 x9 : 0000000000000040
 x8 : 00000000ee601780 x7 : 0000000105c785c0 x6 : ffff0331c1027d80
 x5 : 0000000001ee7ad6 x4 : ffff0335bdea16c0 x3 : ffff0331c123e438
 x2 : 00000000005aca33 x1 : 0000000000000000 x0 : ffff0331c123e410
 Call trace:
  dw_edma_device_tx_status+0xb8/0x130 (P)
  dma_sync_wait+0x60/0xbc
  nvmet_pci_epf_dma_transfer+0x128/0x264 [nvmet_pci_epf]
  nvmet_pci_epf_poll_sqs_work+0x2a0/0x2e0 [nvmet_pci_epf]
  process_one_work+0x144/0x390
  worker_thread+0x27c/0x458
  kthread+0xe8/0x19c
  ret_from_fork+0x10/0x20

The solution for this is simply to explicitly allow rescheduling using
cond_resched(). However, since doing so for every loop of
nvmet_pci_epf_poll_sqs_work() significantly degrades performance
(for 4K random reads using 4 I/O queues, the maximum IOPS goes down from
137 KIOPS to 110 KIOPS), call cond_resched() every second to avoid the
RCU stalls.

Fixes: 0faa0fe ("nvmet: New NVMe PCI endpoint function target driver")
Signed-off-by: Damien Le Moal <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Keith Busch <[email protected]>
tiwai pushed a commit that referenced this pull request Mar 12, 2025
Brad Spengler reported the list_del() corruption splat in
gtp_net_exit_batch_rtnl(). [0]

Commit eb28fd7 ("gtp: Destroy device along with udp socket's netns
dismantle.") added the for_each_netdev() loop in gtp_net_exit_batch_rtnl()
to destroy devices in each netns as done in geneve and ip tunnels.

However, this could trigger ->dellink() twice for the same device during
->exit_batch_rtnl().

Say we have two netns A & B and gtp device B that resides in netns B but
whose UDP socket is in netns A.

  1. cleanup_net() processes netns A and then B.

  2. gtp_net_exit_batch_rtnl() finds the device B while iterating
     netns A's gn->gtp_dev_list and calls ->dellink().

  [ device B is not yet unlinked from netns B
    as unregister_netdevice_many() has not been called. ]

  3. gtp_net_exit_batch_rtnl() finds the device B while iterating
     netns B's for_each_netdev() and calls ->dellink().

gtp_dellink() cleans up the device's hash table, unlinks the dev from
gn->gtp_dev_list, and calls unregister_netdevice_queue().

Basically, calling gtp_dellink() multiple times is fine unless
CONFIG_DEBUG_LIST is enabled.

Let's remove for_each_netdev() in gtp_net_exit_batch_rtnl() and
delegate the destruction to default_device_exit_batch() as done
in bareudp.

[0]:
list_del corruption, ffff8880aaa62c00->next (autoslab_size_M_dev_P_net_core_dev_11127_8_1328_8_S_4096_A_64_n_139+0xc00/0x1000 [slab object]) is LIST_POISON1 (ffffffffffffff02) (prev is 0xffffffffffffff04)
kernel BUG at lib/list_debug.c:58!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 UID: 0 PID: 1804 Comm: kworker/u8:7 Tainted: G                T   6.12.13-grsec-full-20250211091339 #1
Tainted: [T]=RANDSTRUCT
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Workqueue: netns cleanup_net
RIP: 0010:[<ffffffff84947381>] __list_del_entry_valid_or_report+0x141/0x200 lib/list_debug.c:58
Code: c2 76 91 31 c0 e8 9f b1 f7 fc 0f 0b 4d 89 f0 48 c7 c1 02 ff ff ff 48 89 ea 48 89 ee 48 c7 c7 e0 c2 76 91 31 c0 e8 7f b1 f7 fc <0f> 0b 4d 89 e8 48 c7 c1 04 ff ff ff 48 89 ea 48 89 ee 48 c7 c7 60
RSP: 0018:fffffe8040b4fbd0 EFLAGS: 00010283
RAX: 00000000000000cc RBX: dffffc0000000000 RCX: ffffffff818c4054
RDX: ffffffff84947381 RSI: ffffffff818d1512 RDI: 0000000000000000
RBP: ffff8880aaa62c00 R08: 0000000000000001 R09: fffffbd008169f32
R10: fffffe8040b4f997 R11: 0000000000000001 R12: a1988d84f24943e4
R13: ffffffffffffff02 R14: ffffffffffffff04 R15: ffff8880aaa62c08
RBX: kasan shadow of 0x0
RCX: __wake_up_klogd.part.0+0x74/0xe0 kernel/printk/printk.c:4554
RDX: __list_del_entry_valid_or_report+0x141/0x200 lib/list_debug.c:58
RSI: vprintk+0x72/0x100 kernel/printk/printk_safe.c:71
RBP: autoslab_size_M_dev_P_net_core_dev_11127_8_1328_8_S_4096_A_64_n_139+0xc00/0x1000 [slab object]
RSP: process kstack fffffe8040b4fbd0+0x7bd0/0x8000 [kworker/u8:7+netns 1804 ]
R09: kasan shadow of process kstack fffffe8040b4f990+0x7990/0x8000 [kworker/u8:7+netns 1804 ]
R10: process kstack fffffe8040b4f997+0x7997/0x8000 [kworker/u8:7+netns 1804 ]
R15: autoslab_size_M_dev_P_net_core_dev_11127_8_1328_8_S_4096_A_64_n_139+0xc08/0x1000 [slab object]
FS:  0000000000000000(0000) GS:ffff888116000000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000748f5372c000 CR3: 0000000015408000 CR4: 00000000003406f0 shadow CR4: 00000000003406f0
Stack:
 0000000000000000 ffffffff8a0c35e7 ffffffff8a0c3603 ffff8880aaa62c00
 ffff8880aaa62c00 0000000000000004 ffff88811145311c 0000000000000005
 0000000000000001 ffff8880aaa62000 fffffe8040b4fd40 ffffffff8a0c360d
Call Trace:
 <TASK>
 [<ffffffff8a0c360d>] __list_del_entry_valid include/linux/list.h:131 [inline] fffffe8040b4fc28
 [<ffffffff8a0c360d>] __list_del_entry include/linux/list.h:248 [inline] fffffe8040b4fc28
 [<ffffffff8a0c360d>] list_del include/linux/list.h:262 [inline] fffffe8040b4fc28
 [<ffffffff8a0c360d>] gtp_dellink+0x16d/0x360 drivers/net/gtp.c:1557 fffffe8040b4fc28
 [<ffffffff8a0d0404>] gtp_net_exit_batch_rtnl+0x124/0x2c0 drivers/net/gtp.c:2495 fffffe8040b4fc88
 [<ffffffff8e705b24>] cleanup_net+0x5a4/0xbe0 net/core/net_namespace.c:635 fffffe8040b4fcd0
 [<ffffffff81754c97>] process_one_work+0xbd7/0x2160 kernel/workqueue.c:3326 fffffe8040b4fd88
 [<ffffffff81757195>] process_scheduled_works kernel/workqueue.c:3407 [inline] fffffe8040b4fec0
 [<ffffffff81757195>] worker_thread+0x6b5/0xfa0 kernel/workqueue.c:3488 fffffe8040b4fec0
 [<ffffffff817782a0>] kthread+0x360/0x4c0 kernel/kthread.c:397 fffffe8040b4ff78
 [<ffffffff814d8594>] ret_from_fork+0x74/0xe0 arch/x86/kernel/process.c:172 fffffe8040b4ffb8
 [<ffffffff8110f509>] ret_from_fork_asm+0x29/0xc0 arch/x86/entry/entry_64.S:399 fffffe8040b4ffe8
 </TASK>
Modules linked in:

Fixes: eb28fd7 ("gtp: Destroy device along with udp socket's netns dismantle.")
Reported-by: Brad Spengler <[email protected]>
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
tiwai pushed a commit that referenced this pull request Mar 12, 2025
…umers

While using nvme target with use_srq on, below kernel panic is noticed.

[  549.698111] bnxt_en 0000:41:00.0 enp65s0np0: FEC autoneg off encoding: Clause 91 RS(544,514)
[  566.393619] Oops: divide error: 0000 [#1] PREEMPT SMP NOPTI
..
[  566.393799]  <TASK>
[  566.393807]  ? __die_body+0x1a/0x60
[  566.393823]  ? die+0x38/0x60
[  566.393835]  ? do_trap+0xe4/0x110
[  566.393847]  ? bnxt_qplib_alloc_init_hwq+0x1d4/0x580 [bnxt_re]
[  566.393867]  ? bnxt_qplib_alloc_init_hwq+0x1d4/0x580 [bnxt_re]
[  566.393881]  ? do_error_trap+0x7c/0x120
[  566.393890]  ? bnxt_qplib_alloc_init_hwq+0x1d4/0x580 [bnxt_re]
[  566.393911]  ? exc_divide_error+0x34/0x50
[  566.393923]  ? bnxt_qplib_alloc_init_hwq+0x1d4/0x580 [bnxt_re]
[  566.393939]  ? asm_exc_divide_error+0x16/0x20
[  566.393966]  ? bnxt_qplib_alloc_init_hwq+0x1d4/0x580 [bnxt_re]
[  566.393997]  bnxt_qplib_create_srq+0xc9/0x340 [bnxt_re]
[  566.394040]  bnxt_re_create_srq+0x335/0x3b0 [bnxt_re]
[  566.394057]  ? srso_return_thunk+0x5/0x5f
[  566.394068]  ? __init_swait_queue_head+0x4a/0x60
[  566.394090]  ib_create_srq_user+0xa7/0x150 [ib_core]
[  566.394147]  nvmet_rdma_queue_connect+0x7d0/0xbe0 [nvmet_rdma]
[  566.394174]  ? lock_release+0x22c/0x3f0
[  566.394187]  ? srso_return_thunk+0x5/0x5f

Page size and shift info is set only for the user space SRQs.
Set page size and page shift for kernel space SRQs also.

Fixes: 0c4dcd6 ("RDMA/bnxt_re: Refactor hardware queue memory allocation")
Signed-off-by: Kashyap Desai <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
tiwai pushed a commit that referenced this pull request Mar 12, 2025
 into HEAD

KVM/riscv fixes for 6.14, take #1

- Fix hart status check in SBI HSM extension
- Fix hart suspend_type usage in SBI HSM extension
- Fix error returned by SBI IPI and TIME extensions for
  unsupported function IDs
- Fix suspend_type usage in SBI SUSP extension
- Remove unnecessary vcpu kick after injecting interrupt
  via IMSIC guest file
tiwai pushed a commit that referenced this pull request Mar 12, 2025
We have recently seen reports of lockdep circular lock dependency warnings
when loading the iAVF driver:

[ 1504.790308] ======================================================
[ 1504.790309] WARNING: possible circular locking dependency detected
[ 1504.790310] 6.13.0 #net_next_rt.c2933b2befe2.el9 Not tainted
[ 1504.790311] ------------------------------------------------------
[ 1504.790312] kworker/u128:0/13566 is trying to acquire lock:
[ 1504.790313] ffff97d0e4738f18 (&dev->lock){+.+.}-{4:4}, at: register_netdevice+0x52c/0x710
[ 1504.790320]
[ 1504.790320] but task is already holding lock:
[ 1504.790321] ffff97d0e47392e8 (&adapter->crit_lock){+.+.}-{4:4}, at: iavf_finish_config+0x37/0x240 [iavf]
[ 1504.790330]
[ 1504.790330] which lock already depends on the new lock.
[ 1504.790330]
[ 1504.790330]
[ 1504.790330] the existing dependency chain (in reverse order) is:
[ 1504.790331]
[ 1504.790331] -> #1 (&adapter->crit_lock){+.+.}-{4:4}:
[ 1504.790333]        __lock_acquire+0x52d/0xbb0
[ 1504.790337]        lock_acquire+0xd9/0x330
[ 1504.790338]        mutex_lock_nested+0x4b/0xb0
[ 1504.790341]        iavf_finish_config+0x37/0x240 [iavf]
[ 1504.790347]        process_one_work+0x248/0x6d0
[ 1504.790350]        worker_thread+0x18d/0x330
[ 1504.790352]        kthread+0x10e/0x250
[ 1504.790354]        ret_from_fork+0x30/0x50
[ 1504.790357]        ret_from_fork_asm+0x1a/0x30
[ 1504.790361]
[ 1504.790361] -> #0 (&dev->lock){+.+.}-{4:4}:
[ 1504.790364]        check_prev_add+0xf1/0xce0
[ 1504.790366]        validate_chain+0x46a/0x570
[ 1504.790368]        __lock_acquire+0x52d/0xbb0
[ 1504.790370]        lock_acquire+0xd9/0x330
[ 1504.790371]        mutex_lock_nested+0x4b/0xb0
[ 1504.790372]        register_netdevice+0x52c/0x710
[ 1504.790374]        iavf_finish_config+0xfa/0x240 [iavf]
[ 1504.790379]        process_one_work+0x248/0x6d0
[ 1504.790381]        worker_thread+0x18d/0x330
[ 1504.790383]        kthread+0x10e/0x250
[ 1504.790385]        ret_from_fork+0x30/0x50
[ 1504.790387]        ret_from_fork_asm+0x1a/0x30
[ 1504.790389]
[ 1504.790389] other info that might help us debug this:
[ 1504.790389]
[ 1504.790389]  Possible unsafe locking scenario:
[ 1504.790389]
[ 1504.790390]        CPU0                    CPU1
[ 1504.790391]        ----                    ----
[ 1504.790391]   lock(&adapter->crit_lock);
[ 1504.790393]                                lock(&dev->lock);
[ 1504.790394]                                lock(&adapter->crit_lock);
[ 1504.790395]   lock(&dev->lock);
[ 1504.790397]
[ 1504.790397]  *** DEADLOCK ***

This appears to be caused by the change in commit 5fda3f3 ("net: make
netdev_lock() protect netdev->reg_state"), which added a netdev_lock() in
register_netdevice.

The iAVF driver calls register_netdevice() from iavf_finish_config(), as a
final stage of its state machine post-probe. It currently takes the RTNL
lock, then the netdev lock, and then the device critical lock. This pattern
is used throughout the driver. Thus there is a strong dependency that the
crit_lock should not be acquired before the net device lock. The change to
register_netdevice creates an ABBA lock order violation because the iAVF
driver is holding the crit_lock while calling register_netdevice, which
then takes the netdev_lock.

It seems likely that future refactors could result in netdev APIs which
hold the netdev_lock while calling into the driver. This means that we
should not re-order the locks so that netdev_lock is acquired after the
device private crit_lock.

Instead, notice that we already release the netdev_lock prior to calling
the register_netdevice. This flow only happens during the early driver
initialization as we transition through the __IAVF_STARTUP,
__IAVF_INIT_VERSION_CHECK, __IAVF_INIT_GET_RESOURCES, etc.

Analyzing the places where we take crit_lock in the driver there are two
sources:

a) several of the work queue tasks including adminq_task, watchdog_task,
reset_task, and the finish_config task.

b) various callbacks which ultimately stem back to .ndo operations or
ethtool operations.

The latter cannot be triggered until after the netdevice registration is
completed successfully.

The iAVF driver uses alloc_ordered_workqueue, which is an unbound workqueue
that has a max limit of 1, and thus guarantees that only a single work item
on the queue is executing at any given time, so none of the other work
threads could be executing due to the ordered workqueue guarantees.

The iavf_finish_config() function also does not do anything else after
register_netdevice, unless it fails. It seems unlikely that the driver
private crit_lock is protecting anything that register_netdevice() itself
touches.

Thus, to fix this ABBA lock violation, lets simply release the
adapter->crit_lock as well as netdev_lock prior to calling
register_netdevice(). We do still keep holding the RTNL lock as required by
the function. If we do fail to register the netdevice, then we re-acquire
the adapter critical lock to finish the transition back to
__IAVF_INIT_CONFIG_ADAPTER.

This ensures every call where both netdev_lock and the adapter->crit_lock
are acquired under the same ordering.

Fixes: afc6649 ("eth: iavf: extend the netdev_lock usage")
Signed-off-by: Jacob Keller <[email protected]>
Tested-by: Przemek Kitszel <[email protected]>
Reviewed-by: Przemek Kitszel <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
Tested-by: Rafal Romanowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
tiwai pushed a commit that referenced this pull request Mar 12, 2025
The customer reports that there is a soft lockup issue related to
the i2c driver. After checking, the i2c module was doing a tx transfer
and the bmc machine reboots in the middle of the i2c transaction, the i2c
module keeps the status without being reset.

Due to such an i2c module status, the i2c irq handler keeps getting
triggered since the i2c irq handler is registered in the kernel booting
process after the bmc machine is doing a warm rebooting.
The continuous triggering is stopped by the soft lockup watchdog timer.

Disable the interrupt enable bit in the i2c module before calling
devm_request_irq to fix this issue since the i2c relative status bit
is read-only.

Here is the soft lockup log.
[   28.176395] watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [swapper/0:1]
[   28.183351] Modules linked in:
[   28.186407] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.15.120-yocto-s-dirty-bbebc78 #1
[   28.201174] pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   28.208128] pc : __do_softirq+0xb0/0x368
[   28.212055] lr : __do_softirq+0x70/0x368
[   28.215972] sp : ffffff8035ebca00
[   28.219278] x29: ffffff8035ebca00 x28: 0000000000000002 x27: ffffff80071a3780
[   28.226412] x26: ffffffc008bdc000 x25: ffffffc008bcc640 x24: ffffffc008be50c0
[   28.233546] x23: ffffffc00800200c x22: 0000000000000000 x21: 000000000000001b
[   28.240679] x20: 0000000000000000 x19: ffffff80001c3200 x18: ffffffffffffffff
[   28.247812] x17: ffffffc02d2e0000 x16: ffffff8035eb8b40 x15: 00001e8480000000
[   28.254945] x14: 02c3647e37dbfcb6 x13: 02c364f2ab14200c x12: 0000000002c364f2
[   28.262078] x11: 00000000fa83b2da x10: 000000000000b67e x9 : ffffffc008010250
[   28.269211] x8 : 000000009d983d00 x7 : 7fffffffffffffff x6 : 0000036d74732434
[   28.276344] x5 : 00ffffffffffffff x4 : 0000000000000015 x3 : 0000000000000198
[   28.283476] x2 : ffffffc02d2e0000 x1 : 00000000000000e0 x0 : ffffffc008bdcb40
[   28.290611] Call trace:
[   28.293052]  __do_softirq+0xb0/0x368
[   28.296625]  __irq_exit_rcu+0xe0/0x100
[   28.300374]  irq_exit+0x14/0x20
[   28.303513]  handle_domain_irq+0x68/0x90
[   28.307440]  gic_handle_irq+0x78/0xb0
[   28.311098]  call_on_irq_stack+0x20/0x38
[   28.315019]  do_interrupt_handler+0x54/0x5c
[   28.319199]  el1_interrupt+0x2c/0x4c
[   28.322777]  el1h_64_irq_handler+0x14/0x20
[   28.326872]  el1h_64_irq+0x74/0x78
[   28.330269]  __setup_irq+0x454/0x780
[   28.333841]  request_threaded_irq+0xd0/0x1b4
[   28.338107]  devm_request_threaded_irq+0x84/0x100
[   28.342809]  npcm_i2c_probe_bus+0x188/0x3d0
[   28.346990]  platform_probe+0x6c/0xc4
[   28.350653]  really_probe+0xcc/0x45c
[   28.354227]  __driver_probe_device+0x8c/0x160
[   28.358578]  driver_probe_device+0x44/0xe0
[   28.362670]  __driver_attach+0x124/0x1d0
[   28.366589]  bus_for_each_dev+0x7c/0xe0
[   28.370426]  driver_attach+0x28/0x30
[   28.373997]  bus_add_driver+0x124/0x240
[   28.377830]  driver_register+0x7c/0x124
[   28.381662]  __platform_driver_register+0x2c/0x34
[   28.386362]  npcm_i2c_init+0x3c/0x5c
[   28.389937]  do_one_initcall+0x74/0x230
[   28.393768]  kernel_init_freeable+0x24c/0x2b4
[   28.398126]  kernel_init+0x28/0x130
[   28.401614]  ret_from_fork+0x10/0x20
[   28.405189] Kernel panic - not syncing: softlockup: hung tasks
[   28.411011] SMP: stopping secondary CPUs
[   28.414933] Kernel Offset: disabled
[   28.418412] CPU features: 0x00000000,00000802
[   28.427644] Rebooting in 20 seconds..

Fixes: 56a1485 ("i2c: npcm7xx: Add Nuvoton NPCM I2C controller driver")
Signed-off-by: Tyrone Ting <[email protected]>
Cc: <[email protected]> # v5.8+
Reviewed-by: Tali Perry <[email protected]>
Signed-off-by: Andi Shyti <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
tiwai pushed a commit that referenced this pull request Mar 12, 2025
Commit <d74169ceb0d2> ("iommu/vt-d: Allocate DMAR fault interrupts
locally") moved the call to enable_drhd_fault_handling() to a code
path that does not hold any lock while traversing the drhd list. Fix
it by ensuring the dmar_global_lock lock is held when traversing the
drhd list.

Without this fix, the following warning is triggered:
 =============================
 WARNING: suspicious RCU usage
 6.14.0-rc3 torvalds#55 Not tainted
 -----------------------------
 drivers/iommu/intel/dmar.c:2046 RCU-list traversed in non-reader section!!
               other info that might help us debug this:
               rcu_scheduler_active = 1, debug_locks = 1
 2 locks held by cpuhp/1/23:
 #0: ffffffff84a67c50 (cpu_hotplug_lock){++++}-{0:0}, at: cpuhp_thread_fun+0x87/0x2c0
 #1: ffffffff84a6a380 (cpuhp_state-up){+.+.}-{0:0}, at: cpuhp_thread_fun+0x87/0x2c0
 stack backtrace:
 CPU: 1 UID: 0 PID: 23 Comm: cpuhp/1 Not tainted 6.14.0-rc3 torvalds#55
 Call Trace:
  <TASK>
  dump_stack_lvl+0xb7/0xd0
  lockdep_rcu_suspicious+0x159/0x1f0
  ? __pfx_enable_drhd_fault_handling+0x10/0x10
  enable_drhd_fault_handling+0x151/0x180
  cpuhp_invoke_callback+0x1df/0x990
  cpuhp_thread_fun+0x1ea/0x2c0
  smpboot_thread_fn+0x1f5/0x2e0
  ? __pfx_smpboot_thread_fn+0x10/0x10
  kthread+0x12a/0x2d0
  ? __pfx_kthread+0x10/0x10
  ret_from_fork+0x4a/0x60
  ? __pfx_kthread+0x10/0x10
  ret_from_fork_asm+0x1a/0x30
  </TASK>

Holding the lock in enable_drhd_fault_handling() triggers a lockdep splat
about a possible deadlock between dmar_global_lock and cpu_hotplug_lock.
This is avoided by not holding dmar_global_lock when calling
iommu_device_register(), which initiates the device probe process.

Fixes: d74169c ("iommu/vt-d: Allocate DMAR fault interrupts locally")
Reported-and-tested-by: Ido Schimmel <[email protected]>
Closes: https://lore.kernel.org/linux-iommu/[email protected]/
Tested-by: Breno Leitao <[email protected]>
Cc: [email protected]
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Tested-by: Ido Schimmel <[email protected]>
Signed-off-by: Joerg Roedel <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants