Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GIT PULL] v9fs fixes for 3.1-rc6 #10

Merged
merged 6 commits into from
Sep 7, 2011
Merged

[GIT PULL] v9fs fixes for 3.1-rc6 #10

merged 6 commits into from
Sep 7, 2011

Conversation

ericvh
Copy link
Contributor

@ericvh ericvh commented Sep 6, 2011

(will resend on lkml as well, but figured I try the github way for fun)

First off, let me apologize. Vacations and kernel.org disruptions have delayed me from getting you these bug fixes sooner in the cycle. There are a couple of protocol "bugs" fixed here dealing with lack of foresight in developing some of the new protocol extensions.

Thanks.

The following changes since commit ddf2835:

Linux 3.1-rc5 (2011-09-04 15:45:10 -0700)

are available in the git repository at:
git://github.com/ericvh/linux.git for-linus

Aneesh Kumar K.V (5):
fs/9p: Add fid before dentry instantiation
fs/9p: Don't update file type when updating file attributes
net/9p: Fix kernel crash with msize 512K
fs/9p: Add OS dependent open flags in 9p protocol
fs/9p: Always ask new inode in lookup for cache mode disabled

Jim Garlick (1):
fs/9p: Use protocol-defined value for lock/getlock 'type' field.

fs/9p/v9fs_vfs.h | 6 ++-
fs/9p/vfs_file.c | 36 ++++++++++---
fs/9p/vfs_inode.c | 139 ++++++++++++++++++++++++++++++------------------
fs/9p/vfs_inode_dotl.c | 86 +++++++++++++++++++++++++-----
fs/9p/vfs_super.c | 2 +-
include/net/9p/9p.h | 29 ++++++++++
net/9p/trans_virtio.c | 17 ++++--
7 files changed, 234 insertions(+), 81 deletions(-)

kvaneesh and others added 6 commits September 6, 2011 08:17
d_instantiate marks the dentry positive. So a parallel lookup and mkdir of
the directory can find dentry that doesn't have fid attached. This can result
in both the code path doing v9fs_fid_add which results in v9fs_dentry leak.

Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Eric Van Hensbergen <[email protected]>
We should only update attributes that we can change on stat2inode.
Also do file type initialization in v9fs_init_inode.

Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Eric Van Hensbergen <[email protected]>
With msize equal to 512K (PAGE_SIZE * VIRTQUEUE_NUM), we hit multiple
crashes. This patch fix those.

Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Eric Van Hensbergen <[email protected]>
Some of the flags are OS/arch dependent we add a 9p
protocol value which maps to asm-generic/fcntl.h values in Linux
Based on the original patch from Venkateswararao Jujjuri <[email protected]>

Signed-off-by: Aneesh Kumar K.V <[email protected]>
This make sure we don't end up reusing the unlinked inode object.
The ideal way is to use inode i_generation. But i_generation is
not available in userspace always.

Signed-off-by: Aneesh Kumar K.V <[email protected]>
@torvalds torvalds merged commit 51b8b4f into torvalds:master Sep 7, 2011
damentz referenced this pull request in zen-kernel/zen-kernel Sep 27, 2011
commit 130c5ce upstream.

This fixes the A->B/B->A locking dependency, see the warning below.

The function task_exit_notify() is called with (task_exit_notifier)
.rwsem set and then calls sync_buffer() which locks buffer_mutex. In
sync_start() the buffer_mutex was set to prevent notifier functions to
be started before sync_start() is finished. But when registering the
notifier, (task_exit_notifier).rwsem is locked too, but now in
different order than in sync_buffer(). In theory this causes a locking
dependency, what does not occur in practice since task_exit_notify()
is always called after the notifier is registered which means the lock
is already released.

However, after checking the notifier functions it turned out the
buffer_mutex in sync_start() is unnecessary. This is because
sync_buffer() may be called from the notifiers even if sync_start()
did not finish yet, the buffers are already allocated but empty. No
need to protect this with the mutex.

So we fix this theoretical locking dependency by removing buffer_mutex
in sync_start(). This is similar to the implementation before commit:

 750d857 oprofile: fix crash when accessing freed task structs

which introduced the locking dependency.

Lockdep warning:

oprofiled/4447 is trying to acquire lock:
 (buffer_mutex){+.+...}, at: [<ffffffffa0000e55>] sync_buffer+0x31/0x3ec [oprofile]

but task is already holding lock:
 ((task_exit_notifier).rwsem){++++..}, at: [<ffffffff81058026>] __blocking_notifier_call_chain+0x39/0x67

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 ((task_exit_notifier).rwsem){++++..}:
       [<ffffffff8106557f>] lock_acquire+0xf8/0x11e
       [<ffffffff81463a2b>] down_write+0x44/0x67
       [<ffffffff810581c0>] blocking_notifier_chain_register+0x52/0x8b
       [<ffffffff8105a6ac>] profile_event_register+0x2d/0x2f
       [<ffffffffa00013c1>] sync_start+0x47/0xc6 [oprofile]
       [<ffffffffa00001bb>] oprofile_setup+0x60/0xa5 [oprofile]
       [<ffffffffa00014e3>] event_buffer_open+0x59/0x8c [oprofile]
       [<ffffffff810cd3b9>] __dentry_open+0x1eb/0x308
       [<ffffffff810cd59d>] nameidata_to_filp+0x60/0x67
       [<ffffffff810daad6>] do_last+0x5be/0x6b2
       [<ffffffff810dbc33>] path_openat+0xc7/0x360
       [<ffffffff810dbfc5>] do_filp_open+0x3d/0x8c
       [<ffffffff810ccfd2>] do_sys_open+0x110/0x1a9
       [<ffffffff810cd09e>] sys_open+0x20/0x22
       [<ffffffff8146ad4b>] system_call_fastpath+0x16/0x1b

-> #0 (buffer_mutex){+.+...}:
       [<ffffffff81064dfb>] __lock_acquire+0x1085/0x1711
       [<ffffffff8106557f>] lock_acquire+0xf8/0x11e
       [<ffffffff814634f0>] mutex_lock_nested+0x63/0x309
       [<ffffffffa0000e55>] sync_buffer+0x31/0x3ec [oprofile]
       [<ffffffffa0001226>] task_exit_notify+0x16/0x1a [oprofile]
       [<ffffffff81467b96>] notifier_call_chain+0x37/0x63
       [<ffffffff8105803d>] __blocking_notifier_call_chain+0x50/0x67
       [<ffffffff81058068>] blocking_notifier_call_chain+0x14/0x16
       [<ffffffff8105a718>] profile_task_exit+0x1a/0x1c
       [<ffffffff81039e8f>] do_exit+0x2a/0x6fc
       [<ffffffff8103a5e4>] do_group_exit+0x83/0xae
       [<ffffffff8103a626>] sys_exit_group+0x17/0x1b
       [<ffffffff8146ad4b>] system_call_fastpath+0x16/0x1b

other info that might help us debug this:

1 lock held by oprofiled/4447:
 #0:  ((task_exit_notifier).rwsem){++++..}, at: [<ffffffff81058026>] __blocking_notifier_call_chain+0x39/0x67

stack backtrace:
Pid: 4447, comm: oprofiled Not tainted 2.6.39-00007-gcf4d8d4 #10
Call Trace:
 [<ffffffff81063193>] print_circular_bug+0xae/0xbc
 [<ffffffff81064dfb>] __lock_acquire+0x1085/0x1711
 [<ffffffffa0000e55>] ? sync_buffer+0x31/0x3ec [oprofile]
 [<ffffffff8106557f>] lock_acquire+0xf8/0x11e
 [<ffffffffa0000e55>] ? sync_buffer+0x31/0x3ec [oprofile]
 [<ffffffff81062627>] ? mark_lock+0x42f/0x552
 [<ffffffffa0000e55>] ? sync_buffer+0x31/0x3ec [oprofile]
 [<ffffffff814634f0>] mutex_lock_nested+0x63/0x309
 [<ffffffffa0000e55>] ? sync_buffer+0x31/0x3ec [oprofile]
 [<ffffffffa0000e55>] sync_buffer+0x31/0x3ec [oprofile]
 [<ffffffff81058026>] ? __blocking_notifier_call_chain+0x39/0x67
 [<ffffffff81058026>] ? __blocking_notifier_call_chain+0x39/0x67
 [<ffffffffa0001226>] task_exit_notify+0x16/0x1a [oprofile]
 [<ffffffff81467b96>] notifier_call_chain+0x37/0x63
 [<ffffffff8105803d>] __blocking_notifier_call_chain+0x50/0x67
 [<ffffffff81058068>] blocking_notifier_call_chain+0x14/0x16
 [<ffffffff8105a718>] profile_task_exit+0x1a/0x1c
 [<ffffffff81039e8f>] do_exit+0x2a/0x6fc
 [<ffffffff81465031>] ? retint_swapgs+0xe/0x13
 [<ffffffff8103a5e4>] do_group_exit+0x83/0xae
 [<ffffffff8103a626>] sys_exit_group+0x17/0x1b
 [<ffffffff8146ad4b>] system_call_fastpath+0x16/0x1b

Reported-by: Marcin Slusarz <[email protected]>
Cc: Carl Love <[email protected]>
Signed-off-by: Robert Richter <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
cuviper pushed a commit to cuviper/linux-uprobes that referenced this pull request Nov 3, 2011
* Ingo Molnar <[email protected]> wrote:

> The patch below addresses these concerns, serializes the output, tidies up the
> printout, resulting in this new output:

There's one bug remaining that my patch does not address: the vCPUs are not
printed in order:

# vCPU #0's dump:
# vCPU #2's dump:
# vCPU torvalds#24's dump:
# vCPU #5's dump:
# vCPU torvalds#39's dump:
# vCPU torvalds#38's dump:
# vCPU torvalds#51's dump:
# vCPU torvalds#11's dump:
# vCPU torvalds#10's dump:
# vCPU torvalds#12's dump:

This is undesirable as the order of printout is highly random, so successive
dumps are difficult to compare.

The patch below serializes the signalling itself. (this is on top of the
previous patch)

The patch also tweaks the vCPU printout line a bit so that it does not start
with '#', which is discarded if such messages are pasted into Git commit
messages.

Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Pekka Enberg <[email protected]>
torvalds pushed a commit that referenced this pull request Dec 15, 2011
If the pte mapping in generic_perform_write() is unmapped between
iov_iter_fault_in_readable() and iov_iter_copy_from_user_atomic(), the
"copied" parameter to ->end_write can be zero. ext4 couldn't cope with
it with delayed allocations enabled. This skips the i_disksize
enlargement logic if copied is zero and no new data was appeneded to
the inode.

 gdb> bt
 #0  0xffffffff811afe80 in ext4_da_should_update_i_disksize (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x1\
 08000, len=0x1000, copied=0x0, page=0xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2467
 #1  ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\
 xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512
 #2  0xffffffff810d97f1 in generic_perform_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value o\
 ptimized out>, pos=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2440
 #3  generic_file_buffered_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value optimized out>, p\
 os=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2482
 #4  0xffffffff810db5d1 in __generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, ppos=0\
 xffff88001e26be40) at mm/filemap.c:2600
 #5  0xffffffff810db853 in generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=<value optimi\
 zed out>, pos=<value optimized out>) at mm/filemap.c:2632
 #6  0xffffffff811a71aa in ext4_file_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, pos=0x108000) a\
 t fs/ext4/file.c:136
 #7  0xffffffff811375aa in do_sync_write (filp=0xffff88003f606a80, buf=<value optimized out>, len=<value optimized out>, \
 ppos=0xffff88001e26bf48) at fs/read_write.c:406
 #8  0xffffffff81137e56 in vfs_write (file=0xffff88003f606a80, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x4\
 000, pos=0xffff88001e26bf48) at fs/read_write.c:435
 #9  0xffffffff8113816c in sys_write (fd=<value optimized out>, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x\
 4000) at fs/read_write.c:487
 #10 <signal handler called>
 #11 0x00007f120077a390 in __brk_reservation_fn_dmi_alloc__ ()
 #12 0x0000000000000000 in ?? ()
 gdb> print offset
 $22 = 0xffffffffffffffff
 gdb> print idx
 $23 = 0xffffffff
 gdb> print inode->i_blkbits
 $24 = 0xc
 gdb> up
 #1  ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\
 xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512
 2512                    if (ext4_da_should_update_i_disksize(page, end)) {
 gdb> print start
 $25 = 0x0
 gdb> print end
 $26 = 0xffffffffffffffff
 gdb> print pos
 $27 = 0x108000
 gdb> print new_i_size
 $28 = 0x108000
 gdb> print ((struct ext4_inode_info *)((char *)inode-((int)(&((struct ext4_inode_info *)0)->vfs_inode))))->i_disksize
 $29 = 0xd9000
 gdb> down
 2467            for (i = 0; i < idx; i++)
 gdb> print i
 $30 = 0xd44acbee

This is 100% reproducible with some autonuma development code tuned in
a very aggressive manner (not normal way even for knumad) which does
"exotic" changes to the ptes. It wouldn't normally trigger but I don't
see why it can't happen normally if the page is added to swap cache in
between the two faults leading to "copied" being zero (which then
hangs in ext4). So it should be fixed. Especially possible with lumpy
reclaim (albeit disabled if compaction is enabled) as that would
ignore the young bits in the ptes.

Signed-off-by: Andrea Arcangeli <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
Cc: [email protected]
tworaz pushed a commit to tworaz/linux that referenced this pull request Jan 9, 2012
commit f7ab9b4 upstream.

Without tmpfs, shmem_readpage() is not compiled in causing an OOPS as
soon as we try to allocate some swappable pages for GEM.

Jan 19 22:52:26 harlie kernel: Modules linked in: i915(+) drm_kms_helper cfbcopyarea video backlight cfbimgblt cfbfillrect
Jan 19 22:52:26 harlie kernel:
Jan 19 22:52:26 harlie kernel: Pid: 1125, comm: modprobe Not tainted 2.6.37Harlie torvalds#10 To be filled by O.E.M./To be filled by O.E.M.
Jan 19 22:52:26 harlie kernel: EIP: 0060:[<00000000>] EFLAGS: 00010246 CPU: 3
Jan 19 22:52:26 harlie kernel: EIP is at 0x0
Jan 19 22:52:26 harlie kernel: EAX: 00000000 EBX: f7b7d000 ECX: f3383100 EDX: f7b7d000
Jan 19 22:52:26 harlie kernel: ESI: f1456118 EDI: 00000000 EBP: f2303c98 ESP: f2303c7c
Jan 19 22:52:26 harlie kernel:  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Jan 19 22:52:26 harlie kernel: Process modprobe (pid: 1125, ti=f2302000 task=f259cd80 task.ti=f2302000)
Jan 19 22:52:26 harlie kernel: Stack:
Jan 19 22:52:26 harlie udevd-work[1072]: '/sbin/modprobe -b pci:v00008086d00000046sv00000000sd00000000bc03sc00i00' unexpected exit with status 0x0009
Jan 19 22:52:26 harlie kernel:  c1074061 000000d0 f2f42b80 00000000 000a13d2 f2d5dcc0 00000001 f2303cac
Jan 19 22:52:26 harlie kernel:  c107416f 00000000 000a13d2 00000000 f2303cd4 f8d620ed f2cee620 00001000
Jan 19 22:52:26 harlie kernel:  00000000 000a13d2 f1456118 f2d5dcc0 f1a40000 00001000 f2303d04 f8d637ab
Jan 19 22:52:26 harlie kernel: Call Trace:
Jan 19 22:52:26 harlie kernel:  [<c1074061>] ? do_read_cache_page+0x71/0x160
Jan 19 22:52:26 harlie kernel:  [<c107416f>] ? read_cache_page_gfp+0x1f/0x30
Jan 19 22:52:26 harlie kernel:  [<f8d620ed>] ? i915_gem_object_get_pages+0xad/0x1d0 [i915]
Jan 19 22:52:26 harlie kernel:  [<f8d637ab>] ? i915_gem_object_bind_to_gtt+0xeb/0x2d0 [i915]
Jan 19 22:52:26 harlie kernel:  [<f8d65961>] ? i915_gem_object_pin+0x151/0x190 [i915]
Jan 19 22:52:26 harlie kernel:  [<c11e16ed>] ? drm_gem_object_init+0x3d/0x60
Jan 19 22:52:26 harlie kernel:  [<f8d65aa5>] ? i915_gem_init_ringbuffer+0x105/0x1e0 [i915]
Jan 19 22:52:26 harlie kernel:  [<f8d571b7>] ? i915_driver_load+0x667/0x1160 [i915]

Reported-by: John J. Stimson-III <[email protected]>
Signed-off-by: Chris Wilson <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>
jkstrick pushed a commit to jkstrick/linux that referenced this pull request Feb 11, 2012
If the netdev is already in NETREG_UNREGISTERING/_UNREGISTERED state, do not
update the real num tx queues. netdev_queue_update_kobjects() is already
called via remove_queue_kobjects() at NETREG_UNREGISTERING time. So, when
upper layer driver, e.g., FCoE protocol stack is monitoring the netdev
event of NETDEV_UNREGISTER and calls back to LLD ndo_fcoe_disable() to remove
extra queues allocated for FCoE, the associated txq sysfs kobjects are already
removed, and trying to update the real num queues would cause something like
below:

...
PID: 25138  TASK: ffff88021e64c440  CPU: 3   COMMAND: "kworker/3:3"
 #0 [ffff88021f007760] machine_kexec at ffffffff810226d9
 #1 [ffff88021f0077d0] crash_kexec at ffffffff81089d2d
 #2 [ffff88021f0078a0] oops_end at ffffffff813bca78
 #3 [ffff88021f0078d0] no_context at ffffffff81029e72
 #4 [ffff88021f007920] __bad_area_nosemaphore at ffffffff8102a155
 #5 [ffff88021f0079f0] bad_area_nosemaphore at ffffffff8102a23e
 torvalds#6 [ffff88021f007a00] do_page_fault at ffffffff813bf32e
 torvalds#7 [ffff88021f007b10] page_fault at ffffffff813bc045
    [exception RIP: sysfs_find_dirent+17]
    RIP: ffffffff81178611  RSP: ffff88021f007bc0  RFLAGS: 00010246
    RAX: ffff88021e64c440  RBX: ffffffff8156cc63  RCX: 0000000000000004
    RDX: ffffffff8156cc63  RSI: 0000000000000000  RDI: 0000000000000000
    RBP: ffff88021f007be0   R8: 0000000000000004   R9: 0000000000000008
    R10: ffffffff816fed00  R11: 0000000000000004  R12: 0000000000000000
    R13: ffffffff8156cc63  R14: 0000000000000000  R15: ffff8802222a0000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 torvalds#8 [ffff88021f007be8] sysfs_get_dirent at ffffffff81178c07
 torvalds#9 [ffff88021f007c18] sysfs_remove_group at ffffffff8117ac27
torvalds#10 [ffff88021f007c48] netdev_queue_update_kobjects at ffffffff813178f9
torvalds#11 [ffff88021f007c88] netif_set_real_num_tx_queues at ffffffff81303e38
torvalds#12 [ffff88021f007cc8] ixgbe_set_num_queues at ffffffffa0249763 [ixgbe]
torvalds#13 [ffff88021f007cf8] ixgbe_init_interrupt_scheme at ffffffffa024ea89 [ixgbe]
torvalds#14 [ffff88021f007d48] ixgbe_fcoe_disable at ffffffffa0267113 [ixgbe]
torvalds#15 [ffff88021f007d68] vlan_dev_fcoe_disable at ffffffffa014fef5 [8021q]
torvalds#16 [ffff88021f007d78] fcoe_interface_cleanup at ffffffffa02b7dfd [fcoe]
torvalds#17 [ffff88021f007df8] fcoe_destroy_work at ffffffffa02b7f08 [fcoe]
torvalds#18 [ffff88021f007e18] process_one_work at ffffffff8105d7ca
torvalds#19 [ffff88021f007e68] worker_thread at ffffffff81060513
torvalds#20 [ffff88021f007ee8] kthread at ffffffff810648b6
torvalds#21 [ffff88021f007f48] kernel_thread_helper at ffffffff813c40f4

Signed-off-by: Yi Zou <[email protected]>
Tested-by: Ross Brattain <[email protected]>
Tested-by: Stephen Ko <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>
zachariasmaladroit pushed a commit to galaxys-cm7miui-kernel/linux that referenced this pull request Feb 11, 2012
If the netdev is already in NETREG_UNREGISTERING/_UNREGISTERED state, do not
update the real num tx queues. netdev_queue_update_kobjects() is already
called via remove_queue_kobjects() at NETREG_UNREGISTERING time. So, when
upper layer driver, e.g., FCoE protocol stack is monitoring the netdev
event of NETDEV_UNREGISTER and calls back to LLD ndo_fcoe_disable() to remove
extra queues allocated for FCoE, the associated txq sysfs kobjects are already
removed, and trying to update the real num queues would cause something like
below:

...
PID: 25138  TASK: ffff88021e64c440  CPU: 3   COMMAND: "kworker/3:3"
 #0 [ffff88021f007760] machine_kexec at ffffffff810226d9
 #1 [ffff88021f0077d0] crash_kexec at ffffffff81089d2d
 #2 [ffff88021f0078a0] oops_end at ffffffff813bca78
 #3 [ffff88021f0078d0] no_context at ffffffff81029e72
 #4 [ffff88021f007920] __bad_area_nosemaphore at ffffffff8102a155
 #5 [ffff88021f0079f0] bad_area_nosemaphore at ffffffff8102a23e
 torvalds#6 [ffff88021f007a00] do_page_fault at ffffffff813bf32e
 torvalds#7 [ffff88021f007b10] page_fault at ffffffff813bc045
    [exception RIP: sysfs_find_dirent+17]
    RIP: ffffffff81178611  RSP: ffff88021f007bc0  RFLAGS: 00010246
    RAX: ffff88021e64c440  RBX: ffffffff8156cc63  RCX: 0000000000000004
    RDX: ffffffff8156cc63  RSI: 0000000000000000  RDI: 0000000000000000
    RBP: ffff88021f007be0   R8: 0000000000000004   R9: 0000000000000008
    R10: ffffffff816fed00  R11: 0000000000000004  R12: 0000000000000000
    R13: ffffffff8156cc63  R14: 0000000000000000  R15: ffff8802222a0000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 torvalds#8 [ffff88021f007be8] sysfs_get_dirent at ffffffff81178c07
 torvalds#9 [ffff88021f007c18] sysfs_remove_group at ffffffff8117ac27
torvalds#10 [ffff88021f007c48] netdev_queue_update_kobjects at ffffffff813178f9
torvalds#11 [ffff88021f007c88] netif_set_real_num_tx_queues at ffffffff81303e38
torvalds#12 [ffff88021f007cc8] ixgbe_set_num_queues at ffffffffa0249763 [ixgbe]
torvalds#13 [ffff88021f007cf8] ixgbe_init_interrupt_scheme at ffffffffa024ea89 [ixgbe]
torvalds#14 [ffff88021f007d48] ixgbe_fcoe_disable at ffffffffa0267113 [ixgbe]
torvalds#15 [ffff88021f007d68] vlan_dev_fcoe_disable at ffffffffa014fef5 [8021q]
torvalds#16 [ffff88021f007d78] fcoe_interface_cleanup at ffffffffa02b7dfd [fcoe]
torvalds#17 [ffff88021f007df8] fcoe_destroy_work at ffffffffa02b7f08 [fcoe]
torvalds#18 [ffff88021f007e18] process_one_work at ffffffff8105d7ca
torvalds#19 [ffff88021f007e68] worker_thread at ffffffff81060513
torvalds#20 [ffff88021f007ee8] kthread at ffffffff810648b6
torvalds#21 [ffff88021f007f48] kernel_thread_helper at ffffffff813c40f4

Signed-off-by: Yi Zou <[email protected]>
Tested-by: Ross Brattain <[email protected]>
Tested-by: Stephen Ko <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>
tworaz pushed a commit to tworaz/linux that referenced this pull request Feb 13, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 torvalds#6 [d72d3cb4] isolate_migratepages at c030b15a
 torvalds#7 [d72d3d1] zone_watermark_ok at c02d26cb
 torvalds#8 [d72d3d2c] compact_zone at c030b8de
 torvalds#9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <[email protected]>
Tested-by: Herbert van den Bergh <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
xXorAa pushed a commit to xXorAa/linux that referenced this pull request Feb 17, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 torvalds#6 [d72d3cb4] isolate_migratepages at c030b15a
 torvalds#7 [d72d3d1] zone_watermark_ok at c02d26cb
 torvalds#8 [d72d3d2c] compact_zone at c030b8de
 torvalds#9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <[email protected]>
Tested-by: Herbert van den Bergh <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request Feb 23, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d1] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8de
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <[email protected]>
Tested-by: Herbert van den Bergh <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request Mar 1, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d1] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8de
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <[email protected]>
Tested-by: Herbert van den Bergh <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request Mar 19, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d1] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8de
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <[email protected]>
Tested-by: Herbert van den Bergh <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request Mar 22, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d1] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8de
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <[email protected]>
Tested-by: Herbert van den Bergh <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request Apr 2, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d1] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8de
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <[email protected]>
Tested-by: Herbert van den Bergh <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request Apr 9, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d1] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8de
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <[email protected]>
Tested-by: Herbert van den Bergh <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request Apr 11, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d1] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8de
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <[email protected]>
Tested-by: Herbert van den Bergh <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request Apr 12, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d1] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8de
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <[email protected]>
Tested-by: Herbert van den Bergh <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
psanford pushed a commit to retailnext/linux that referenced this pull request Apr 16, 2012
BugLink: http://bugs.launchpad.net/bugs/907778

commit ea51d13 upstream.

If the pte mapping in generic_perform_write() is unmapped between
iov_iter_fault_in_readable() and iov_iter_copy_from_user_atomic(), the
"copied" parameter to ->end_write can be zero. ext4 couldn't cope with
it with delayed allocations enabled. This skips the i_disksize
enlargement logic if copied is zero and no new data was appeneded to
the inode.

 gdb> bt
 #0  0xffffffff811afe80 in ext4_da_should_update_i_disksize (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x1\
 08000, len=0x1000, copied=0x0, page=0xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2467
 #1  ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\
 xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512
 #2  0xffffffff810d97f1 in generic_perform_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value o\
 ptimized out>, pos=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2440
 #3  generic_file_buffered_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value optimized out>, p\
 os=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2482
 #4  0xffffffff810db5d1 in __generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, ppos=0\
 xffff88001e26be40) at mm/filemap.c:2600
 #5  0xffffffff810db853 in generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=<value optimi\
 zed out>, pos=<value optimized out>) at mm/filemap.c:2632
 torvalds#6  0xffffffff811a71aa in ext4_file_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, pos=0x108000) a\
 t fs/ext4/file.c:136
 torvalds#7  0xffffffff811375aa in do_sync_write (filp=0xffff88003f606a80, buf=<value optimized out>, len=<value optimized out>, \
 ppos=0xffff88001e26bf48) at fs/read_write.c:406
 torvalds#8  0xffffffff81137e56 in vfs_write (file=0xffff88003f606a80, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x4\
 000, pos=0xffff88001e26bf48) at fs/read_write.c:435
 torvalds#9  0xffffffff8113816c in sys_write (fd=<value optimized out>, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x\
 4000) at fs/read_write.c:487
 torvalds#10 <signal handler called>
 torvalds#11 0x00007f120077a390 in __brk_reservation_fn_dmi_alloc__ ()
 torvalds#12 0x0000000000000000 in ?? ()
 gdb> print offset
 $22 = 0xffffffffffffffff
 gdb> print idx
 $23 = 0xffffffff
 gdb> print inode->i_blkbits
 $24 = 0xc
 gdb> up
 #1  ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\
 xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512
 2512                    if (ext4_da_should_update_i_disksize(page, end)) {
 gdb> print start
 $25 = 0x0
 gdb> print end
 $26 = 0xffffffffffffffff
 gdb> print pos
 $27 = 0x108000
 gdb> print new_i_size
 $28 = 0x108000
 gdb> print ((struct ext4_inode_info *)((char *)inode-((int)(&((struct ext4_inode_info *)0)->vfs_inode))))->i_disksize
 $29 = 0xd9000
 gdb> down
 2467            for (i = 0; i < idx; i++)
 gdb> print i
 $30 = 0xd44acbee

This is 100% reproducible with some autonuma development code tuned in
a very aggressive manner (not normal way even for knumad) which does
"exotic" changes to the ptes. It wouldn't normally trigger but I don't
see why it can't happen normally if the page is added to swap cache in
between the two faults leading to "copied" being zero (which then
hangs in ext4). So it should be fixed. Especially possible with lumpy
reclaim (albeit disabled if compaction is enabled) as that would
ignore the young bits in the ptes.

Signed-off-by: Andrea Arcangeli <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Tim Gardner <[email protected]>
Signed-off-by: Brad Figg <[email protected]>
psanford pushed a commit to retailnext/linux that referenced this pull request Apr 16, 2012
…S block during isolation for migration

BugLink: http://bugs.launchpad.net/bugs/931719

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 torvalds#6 [d72d3cb4] isolate_migratepages at c030b15a
 torvalds#7 [d72d3d1] zone_watermark_ok at c02d26cb
 torvalds#8 [d72d3d2c] compact_zone at c030b8de
 torvalds#9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <[email protected]>
Tested-by: Herbert van den Bergh <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Tim Gardner <[email protected]>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request Apr 19, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d1] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8de
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <[email protected]>
Tested-by: Herbert van den Bergh <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request May 4, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d1] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8de
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <[email protected]>
Tested-by: Herbert van den Bergh <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request May 4, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d1] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8de
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <[email protected]>
Tested-by: Herbert van den Bergh <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request May 5, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d1] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8de
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <[email protected]>
Tested-by: Herbert van den Bergh <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
koenkooi pushed a commit to koenkooi/linux that referenced this pull request May 7, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d1] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8de
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <[email protected]>
Tested-by: Herbert van den Bergh <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Feb 10, 2025
An ioctl caller of SIOCETHTOOL ETHTOOL_GSET can provoke the legacy
ethtool codepath on a non-present device, leading to kernel panic:

     [exception RIP: qed_get_current_link+0x11]
  torvalds#8 [ffffa2021d70f948] qede_get_link_ksettings at ffffffffc07bfa9a [qede]
  torvalds#9 [ffffa2021d70f9d0] __rh_call_get_link_ksettings at ffffffff9bad2723
 torvalds#10 [ffffa2021d70fa30] ethtool_get_settings at ffffffff9bad29d0
 torvalds#11 [ffffa2021d70fb18] __dev_ethtool at ffffffff9bad442b
 torvalds#12 [ffffa2021d70fc28] dev_ethtool at ffffffff9bad6db8
 torvalds#13 [ffffa2021d70fc60] dev_ioctl at ffffffff9ba7a55c
 torvalds#14 [ffffa2021d70fc98] sock_do_ioctl at ffffffff9ba22a44
 torvalds#15 [ffffa2021d70fd08] sock_ioctl at ffffffff9ba22d1c
 torvalds#16 [ffffa2021d70fd78] do_vfs_ioctl at ffffffff9b584cf4

Device is not present with no state bits set:

crash> net_device.state ffff8fff95240000
  state = 0x0,

Existing patch commit a699781 ("ethtool: check device is present
when getting link settings") fixes this in the modern sysfs reader's
ksettings path.

Fix this in the legacy ioctl path by checking for device presence as
well.

Fixes: 4bc71cb ("net: consolidate and fix ethtool_ops->get_settings calling")
Fixes: 3f1ac7a ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Tested-by: John J Coleman <[email protected]>
Co-developed-by: Jamie Bainbridge <[email protected]>
Signed-off-by: Jamie Bainbridge <[email protected]>
Signed-off-by: John J Coleman <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Feb 10, 2025
An ioctl caller of SIOCETHTOOL ETHTOOL_GSET can provoke the legacy
ethtool codepath on a non-present device, leading to kernel panic:

     [exception RIP: qed_get_current_link+0x11]
  torvalds#8 [ffffa2021d70f948] qede_get_link_ksettings at ffffffffc07bfa9a [qede]
  torvalds#9 [ffffa2021d70f9d0] __rh_call_get_link_ksettings at ffffffff9bad2723
 torvalds#10 [ffffa2021d70fa30] ethtool_get_settings at ffffffff9bad29d0
 torvalds#11 [ffffa2021d70fb18] __dev_ethtool at ffffffff9bad442b
 torvalds#12 [ffffa2021d70fc28] dev_ethtool at ffffffff9bad6db8
 torvalds#13 [ffffa2021d70fc60] dev_ioctl at ffffffff9ba7a55c
 torvalds#14 [ffffa2021d70fc98] sock_do_ioctl at ffffffff9ba22a44
 torvalds#15 [ffffa2021d70fd08] sock_ioctl at ffffffff9ba22d1c
 torvalds#16 [ffffa2021d70fd78] do_vfs_ioctl at ffffffff9b584cf4

Device is not present with no state bits set:

crash> net_device.state ffff8fff95240000
  state = 0x0,

Existing patch commit a699781 ("ethtool: check device is present
when getting link settings") fixes this in the modern sysfs reader's
ksettings path.

Fix this in the legacy ioctl path by checking for device presence as
well.

Fixes: 4bc71cb ("net: consolidate and fix ethtool_ops->get_settings calling")
Fixes: 3f1ac7a ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Tested-by: John J Coleman <[email protected]>
Co-developed-by: Jamie Bainbridge <[email protected]>
Signed-off-by: Jamie Bainbridge <[email protected]>
Signed-off-by: John J Coleman <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Feb 10, 2025
An ioctl caller of SIOCETHTOOL ETHTOOL_GSET can provoke the legacy
ethtool codepath on a non-present device, leading to kernel panic:

     [exception RIP: qed_get_current_link+0x11]
  torvalds#8 [ffffa2021d70f948] qede_get_link_ksettings at ffffffffc07bfa9a [qede]
  torvalds#9 [ffffa2021d70f9d0] __rh_call_get_link_ksettings at ffffffff9bad2723
 torvalds#10 [ffffa2021d70fa30] ethtool_get_settings at ffffffff9bad29d0
 torvalds#11 [ffffa2021d70fb18] __dev_ethtool at ffffffff9bad442b
 torvalds#12 [ffffa2021d70fc28] dev_ethtool at ffffffff9bad6db8
 torvalds#13 [ffffa2021d70fc60] dev_ioctl at ffffffff9ba7a55c
 torvalds#14 [ffffa2021d70fc98] sock_do_ioctl at ffffffff9ba22a44
 torvalds#15 [ffffa2021d70fd08] sock_ioctl at ffffffff9ba22d1c
 torvalds#16 [ffffa2021d70fd78] do_vfs_ioctl at ffffffff9b584cf4

Device is not present with no state bits set:

crash> net_device.state ffff8fff95240000
  state = 0x0,

Existing patch commit a699781 ("ethtool: check device is present
when getting link settings") fixes this in the modern sysfs reader's
ksettings path.

Fix this in the legacy ioctl path by checking for device presence as
well.

Fixes: 4bc71cb ("net: consolidate and fix ethtool_ops->get_settings calling")
Fixes: 3f1ac7a ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Tested-by: John J Coleman <[email protected]>
Co-developed-by: Jamie Bainbridge <[email protected]>
Signed-off-by: Jamie Bainbridge <[email protected]>
Signed-off-by: John J Coleman <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Feb 10, 2025
An ioctl caller of SIOCETHTOOL ETHTOOL_GSET can provoke the legacy
ethtool codepath on a non-present device, leading to kernel panic:

     [exception RIP: qed_get_current_link+0x11]
  torvalds#8 [ffffa2021d70f948] qede_get_link_ksettings at ffffffffc07bfa9a [qede]
  torvalds#9 [ffffa2021d70f9d0] __rh_call_get_link_ksettings at ffffffff9bad2723
 torvalds#10 [ffffa2021d70fa30] ethtool_get_settings at ffffffff9bad29d0
 torvalds#11 [ffffa2021d70fb18] __dev_ethtool at ffffffff9bad442b
 torvalds#12 [ffffa2021d70fc28] dev_ethtool at ffffffff9bad6db8
 torvalds#13 [ffffa2021d70fc60] dev_ioctl at ffffffff9ba7a55c
 torvalds#14 [ffffa2021d70fc98] sock_do_ioctl at ffffffff9ba22a44
 torvalds#15 [ffffa2021d70fd08] sock_ioctl at ffffffff9ba22d1c
 torvalds#16 [ffffa2021d70fd78] do_vfs_ioctl at ffffffff9b584cf4

Device is not present with no state bits set:

crash> net_device.state ffff8fff95240000
  state = 0x0,

Existing patch commit a699781 ("ethtool: check device is present
when getting link settings") fixes this in the modern sysfs reader's
ksettings path.

Fix this in the legacy ioctl path by checking for device presence as
well.

Fixes: 4bc71cb ("net: consolidate and fix ethtool_ops->get_settings calling")
Fixes: 3f1ac7a ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Tested-by: John J Coleman <[email protected]>
Co-developed-by: Jamie Bainbridge <[email protected]>
Signed-off-by: Jamie Bainbridge <[email protected]>
Signed-off-by: John J Coleman <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Feb 10, 2025
An ioctl caller of SIOCETHTOOL ETHTOOL_GSET can provoke the legacy
ethtool codepath on a non-present device, leading to kernel panic:

     [exception RIP: qed_get_current_link+0x11]
  torvalds#8 [ffffa2021d70f948] qede_get_link_ksettings at ffffffffc07bfa9a [qede]
  torvalds#9 [ffffa2021d70f9d0] __rh_call_get_link_ksettings at ffffffff9bad2723
 torvalds#10 [ffffa2021d70fa30] ethtool_get_settings at ffffffff9bad29d0
 torvalds#11 [ffffa2021d70fb18] __dev_ethtool at ffffffff9bad442b
 torvalds#12 [ffffa2021d70fc28] dev_ethtool at ffffffff9bad6db8
 torvalds#13 [ffffa2021d70fc60] dev_ioctl at ffffffff9ba7a55c
 torvalds#14 [ffffa2021d70fc98] sock_do_ioctl at ffffffff9ba22a44
 torvalds#15 [ffffa2021d70fd08] sock_ioctl at ffffffff9ba22d1c
 torvalds#16 [ffffa2021d70fd78] do_vfs_ioctl at ffffffff9b584cf4

Device is not present with no state bits set:

crash> net_device.state ffff8fff95240000
  state = 0x0,

Existing patch commit a699781 ("ethtool: check device is present
when getting link settings") fixes this in the modern sysfs reader's
ksettings path.

Fix this in the legacy ioctl path by checking for device presence as
well.

Fixes: 4bc71cb ("net: consolidate and fix ethtool_ops->get_settings calling")
Fixes: 3f1ac7a ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Tested-by: John J Coleman <[email protected]>
Co-developed-by: Jamie Bainbridge <[email protected]>
Signed-off-by: Jamie Bainbridge <[email protected]>
Signed-off-by: John J Coleman <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Feb 11, 2025
An ioctl caller of SIOCETHTOOL ETHTOOL_GSET can provoke the legacy
ethtool codepath on a non-present device, leading to kernel panic:

     [exception RIP: qed_get_current_link+0x11]
  torvalds#8 [ffffa2021d70f948] qede_get_link_ksettings at ffffffffc07bfa9a [qede]
  torvalds#9 [ffffa2021d70f9d0] __rh_call_get_link_ksettings at ffffffff9bad2723
 torvalds#10 [ffffa2021d70fa30] ethtool_get_settings at ffffffff9bad29d0
 torvalds#11 [ffffa2021d70fb18] __dev_ethtool at ffffffff9bad442b
 torvalds#12 [ffffa2021d70fc28] dev_ethtool at ffffffff9bad6db8
 torvalds#13 [ffffa2021d70fc60] dev_ioctl at ffffffff9ba7a55c
 torvalds#14 [ffffa2021d70fc98] sock_do_ioctl at ffffffff9ba22a44
 torvalds#15 [ffffa2021d70fd08] sock_ioctl at ffffffff9ba22d1c
 torvalds#16 [ffffa2021d70fd78] do_vfs_ioctl at ffffffff9b584cf4

Device is not present with no state bits set:

crash> net_device.state ffff8fff95240000
  state = 0x0,

Existing patch commit a699781 ("ethtool: check device is present
when getting link settings") fixes this in the modern sysfs reader's
ksettings path.

Fix this in the legacy ioctl path by checking for device presence as
well.

Fixes: 4bc71cb ("net: consolidate and fix ethtool_ops->get_settings calling")
Fixes: 3f1ac7a ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Tested-by: John J Coleman <[email protected]>
Co-developed-by: Jamie Bainbridge <[email protected]>
Signed-off-by: Jamie Bainbridge <[email protected]>
Signed-off-by: John J Coleman <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Feb 11, 2025
An ioctl caller of SIOCETHTOOL ETHTOOL_GSET can provoke the legacy
ethtool codepath on a non-present device, leading to kernel panic:

     [exception RIP: qed_get_current_link+0x11]
  torvalds#8 [ffffa2021d70f948] qede_get_link_ksettings at ffffffffc07bfa9a [qede]
  torvalds#9 [ffffa2021d70f9d0] __rh_call_get_link_ksettings at ffffffff9bad2723
 torvalds#10 [ffffa2021d70fa30] ethtool_get_settings at ffffffff9bad29d0
 torvalds#11 [ffffa2021d70fb18] __dev_ethtool at ffffffff9bad442b
 torvalds#12 [ffffa2021d70fc28] dev_ethtool at ffffffff9bad6db8
 torvalds#13 [ffffa2021d70fc60] dev_ioctl at ffffffff9ba7a55c
 torvalds#14 [ffffa2021d70fc98] sock_do_ioctl at ffffffff9ba22a44
 torvalds#15 [ffffa2021d70fd08] sock_ioctl at ffffffff9ba22d1c
 torvalds#16 [ffffa2021d70fd78] do_vfs_ioctl at ffffffff9b584cf4

Device is not present with no state bits set:

crash> net_device.state ffff8fff95240000
  state = 0x0,

Existing patch commit a699781 ("ethtool: check device is present
when getting link settings") fixes this in the modern sysfs reader's
ksettings path.

Fix this in the legacy ioctl path by checking for device presence as
well.

Fixes: 4bc71cb ("net: consolidate and fix ethtool_ops->get_settings calling")
Fixes: 3f1ac7a ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Tested-by: John J Coleman <[email protected]>
Co-developed-by: Jamie Bainbridge <[email protected]>
Signed-off-by: Jamie Bainbridge <[email protected]>
Signed-off-by: John J Coleman <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Feb 11, 2025
An ioctl caller of SIOCETHTOOL ETHTOOL_GSET can provoke the legacy
ethtool codepath on a non-present device, leading to kernel panic:

     [exception RIP: qed_get_current_link+0x11]
  torvalds#8 [ffffa2021d70f948] qede_get_link_ksettings at ffffffffc07bfa9a [qede]
  torvalds#9 [ffffa2021d70f9d0] __rh_call_get_link_ksettings at ffffffff9bad2723
 torvalds#10 [ffffa2021d70fa30] ethtool_get_settings at ffffffff9bad29d0
 torvalds#11 [ffffa2021d70fb18] __dev_ethtool at ffffffff9bad442b
 torvalds#12 [ffffa2021d70fc28] dev_ethtool at ffffffff9bad6db8
 torvalds#13 [ffffa2021d70fc60] dev_ioctl at ffffffff9ba7a55c
 torvalds#14 [ffffa2021d70fc98] sock_do_ioctl at ffffffff9ba22a44
 torvalds#15 [ffffa2021d70fd08] sock_ioctl at ffffffff9ba22d1c
 torvalds#16 [ffffa2021d70fd78] do_vfs_ioctl at ffffffff9b584cf4

Device is not present with no state bits set:

crash> net_device.state ffff8fff95240000
  state = 0x0,

Existing patch commit a699781 ("ethtool: check device is present
when getting link settings") fixes this in the modern sysfs reader's
ksettings path.

Fix this in the legacy ioctl path by checking for device presence as
well.

Fixes: 4bc71cb ("net: consolidate and fix ethtool_ops->get_settings calling")
Fixes: 3f1ac7a ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Tested-by: John J Coleman <[email protected]>
Co-developed-by: Jamie Bainbridge <[email protected]>
Signed-off-by: Jamie Bainbridge <[email protected]>
Signed-off-by: John J Coleman <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Feb 11, 2025
An ioctl caller of SIOCETHTOOL ETHTOOL_GSET can provoke the legacy
ethtool codepath on a non-present device, leading to kernel panic:

     [exception RIP: qed_get_current_link+0x11]
  torvalds#8 [ffffa2021d70f948] qede_get_link_ksettings at ffffffffc07bfa9a [qede]
  torvalds#9 [ffffa2021d70f9d0] __rh_call_get_link_ksettings at ffffffff9bad2723
 torvalds#10 [ffffa2021d70fa30] ethtool_get_settings at ffffffff9bad29d0
 torvalds#11 [ffffa2021d70fb18] __dev_ethtool at ffffffff9bad442b
 torvalds#12 [ffffa2021d70fc28] dev_ethtool at ffffffff9bad6db8
 torvalds#13 [ffffa2021d70fc60] dev_ioctl at ffffffff9ba7a55c
 torvalds#14 [ffffa2021d70fc98] sock_do_ioctl at ffffffff9ba22a44
 torvalds#15 [ffffa2021d70fd08] sock_ioctl at ffffffff9ba22d1c
 torvalds#16 [ffffa2021d70fd78] do_vfs_ioctl at ffffffff9b584cf4

Device is not present with no state bits set:

crash> net_device.state ffff8fff95240000
  state = 0x0,

Existing patch commit a699781 ("ethtool: check device is present
when getting link settings") fixes this in the modern sysfs reader's
ksettings path.

Fix this in the legacy ioctl path by checking for device presence as
well.

Fixes: 4bc71cb ("net: consolidate and fix ethtool_ops->get_settings calling")
Fixes: 3f1ac7a ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Tested-by: John J Coleman <[email protected]>
Co-developed-by: Jamie Bainbridge <[email protected]>
Signed-off-by: Jamie Bainbridge <[email protected]>
Signed-off-by: John J Coleman <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Feb 11, 2025
An ioctl caller of SIOCETHTOOL ETHTOOL_GSET can provoke the legacy
ethtool codepath on a non-present device, leading to kernel panic:

     [exception RIP: qed_get_current_link+0x11]
  torvalds#8 [ffffa2021d70f948] qede_get_link_ksettings at ffffffffc07bfa9a [qede]
  torvalds#9 [ffffa2021d70f9d0] __rh_call_get_link_ksettings at ffffffff9bad2723
 torvalds#10 [ffffa2021d70fa30] ethtool_get_settings at ffffffff9bad29d0
 torvalds#11 [ffffa2021d70fb18] __dev_ethtool at ffffffff9bad442b
 torvalds#12 [ffffa2021d70fc28] dev_ethtool at ffffffff9bad6db8
 torvalds#13 [ffffa2021d70fc60] dev_ioctl at ffffffff9ba7a55c
 torvalds#14 [ffffa2021d70fc98] sock_do_ioctl at ffffffff9ba22a44
 torvalds#15 [ffffa2021d70fd08] sock_ioctl at ffffffff9ba22d1c
 torvalds#16 [ffffa2021d70fd78] do_vfs_ioctl at ffffffff9b584cf4

Device is not present with no state bits set:

crash> net_device.state ffff8fff95240000
  state = 0x0,

Existing patch commit a699781 ("ethtool: check device is present
when getting link settings") fixes this in the modern sysfs reader's
ksettings path.

Fix this in the legacy ioctl path by checking for device presence as
well.

Fixes: 4bc71cb ("net: consolidate and fix ethtool_ops->get_settings calling")
Fixes: 3f1ac7a ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Tested-by: John J Coleman <[email protected]>
Co-developed-by: Jamie Bainbridge <[email protected]>
Signed-off-by: Jamie Bainbridge <[email protected]>
Signed-off-by: John J Coleman <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Feb 11, 2025
An ioctl caller of SIOCETHTOOL ETHTOOL_GSET can provoke the legacy
ethtool codepath on a non-present device, leading to kernel panic:

     [exception RIP: qed_get_current_link+0x11]
  torvalds#8 [ffffa2021d70f948] qede_get_link_ksettings at ffffffffc07bfa9a [qede]
  torvalds#9 [ffffa2021d70f9d0] __rh_call_get_link_ksettings at ffffffff9bad2723
 torvalds#10 [ffffa2021d70fa30] ethtool_get_settings at ffffffff9bad29d0
 torvalds#11 [ffffa2021d70fb18] __dev_ethtool at ffffffff9bad442b
 torvalds#12 [ffffa2021d70fc28] dev_ethtool at ffffffff9bad6db8
 torvalds#13 [ffffa2021d70fc60] dev_ioctl at ffffffff9ba7a55c
 torvalds#14 [ffffa2021d70fc98] sock_do_ioctl at ffffffff9ba22a44
 torvalds#15 [ffffa2021d70fd08] sock_ioctl at ffffffff9ba22d1c
 torvalds#16 [ffffa2021d70fd78] do_vfs_ioctl at ffffffff9b584cf4

Device is not present with no state bits set:

crash> net_device.state ffff8fff95240000
  state = 0x0,

Existing patch commit a699781 ("ethtool: check device is present
when getting link settings") fixes this in the modern sysfs reader's
ksettings path.

Fix this in the legacy ioctl path by checking for device presence as
well.

Fixes: 4bc71cb ("net: consolidate and fix ethtool_ops->get_settings calling")
Fixes: 3f1ac7a ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Tested-by: John J Coleman <[email protected]>
Co-developed-by: Jamie Bainbridge <[email protected]>
Signed-off-by: Jamie Bainbridge <[email protected]>
Signed-off-by: John J Coleman <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Feb 11, 2025
An ioctl caller of SIOCETHTOOL ETHTOOL_GSET can provoke the legacy
ethtool codepath on a non-present device, leading to kernel panic:

     [exception RIP: qed_get_current_link+0x11]
  torvalds#8 [ffffa2021d70f948] qede_get_link_ksettings at ffffffffc07bfa9a [qede]
  torvalds#9 [ffffa2021d70f9d0] __rh_call_get_link_ksettings at ffffffff9bad2723
 torvalds#10 [ffffa2021d70fa30] ethtool_get_settings at ffffffff9bad29d0
 torvalds#11 [ffffa2021d70fb18] __dev_ethtool at ffffffff9bad442b
 torvalds#12 [ffffa2021d70fc28] dev_ethtool at ffffffff9bad6db8
 torvalds#13 [ffffa2021d70fc60] dev_ioctl at ffffffff9ba7a55c
 torvalds#14 [ffffa2021d70fc98] sock_do_ioctl at ffffffff9ba22a44
 torvalds#15 [ffffa2021d70fd08] sock_ioctl at ffffffff9ba22d1c
 torvalds#16 [ffffa2021d70fd78] do_vfs_ioctl at ffffffff9b584cf4

Device is not present with no state bits set:

crash> net_device.state ffff8fff95240000
  state = 0x0,

Existing patch commit a699781 ("ethtool: check device is present
when getting link settings") fixes this in the modern sysfs reader's
ksettings path.

Fix this in the legacy ioctl path by checking for device presence as
well.

Fixes: 4bc71cb ("net: consolidate and fix ethtool_ops->get_settings calling")
Fixes: 3f1ac7a ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Tested-by: John J Coleman <[email protected]>
Co-developed-by: Jamie Bainbridge <[email protected]>
Signed-off-by: Jamie Bainbridge <[email protected]>
Signed-off-by: John J Coleman <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Feb 11, 2025
An ioctl caller of SIOCETHTOOL ETHTOOL_GSET can provoke the legacy
ethtool codepath on a non-present device, leading to kernel panic:

     [exception RIP: qed_get_current_link+0x11]
  torvalds#8 [ffffa2021d70f948] qede_get_link_ksettings at ffffffffc07bfa9a [qede]
  torvalds#9 [ffffa2021d70f9d0] __rh_call_get_link_ksettings at ffffffff9bad2723
 torvalds#10 [ffffa2021d70fa30] ethtool_get_settings at ffffffff9bad29d0
 torvalds#11 [ffffa2021d70fb18] __dev_ethtool at ffffffff9bad442b
 torvalds#12 [ffffa2021d70fc28] dev_ethtool at ffffffff9bad6db8
 torvalds#13 [ffffa2021d70fc60] dev_ioctl at ffffffff9ba7a55c
 torvalds#14 [ffffa2021d70fc98] sock_do_ioctl at ffffffff9ba22a44
 torvalds#15 [ffffa2021d70fd08] sock_ioctl at ffffffff9ba22d1c
 torvalds#16 [ffffa2021d70fd78] do_vfs_ioctl at ffffffff9b584cf4

Device is not present with no state bits set:

crash> net_device.state ffff8fff95240000
  state = 0x0,

Existing patch commit a699781 ("ethtool: check device is present
when getting link settings") fixes this in the modern sysfs reader's
ksettings path.

Fix this in the legacy ioctl path by checking for device presence as
well.

Fixes: 4bc71cb ("net: consolidate and fix ethtool_ops->get_settings calling")
Fixes: 3f1ac7a ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Tested-by: John J Coleman <[email protected]>
Co-developed-by: Jamie Bainbridge <[email protected]>
Signed-off-by: Jamie Bainbridge <[email protected]>
Signed-off-by: John J Coleman <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Feb 12, 2025
An ioctl caller of SIOCETHTOOL ETHTOOL_GSET can provoke the legacy
ethtool codepath on a non-present device, leading to kernel panic:

     [exception RIP: qed_get_current_link+0x11]
  torvalds#8 [ffffa2021d70f948] qede_get_link_ksettings at ffffffffc07bfa9a [qede]
  torvalds#9 [ffffa2021d70f9d0] __rh_call_get_link_ksettings at ffffffff9bad2723
 torvalds#10 [ffffa2021d70fa30] ethtool_get_settings at ffffffff9bad29d0
 torvalds#11 [ffffa2021d70fb18] __dev_ethtool at ffffffff9bad442b
 torvalds#12 [ffffa2021d70fc28] dev_ethtool at ffffffff9bad6db8
 torvalds#13 [ffffa2021d70fc60] dev_ioctl at ffffffff9ba7a55c
 torvalds#14 [ffffa2021d70fc98] sock_do_ioctl at ffffffff9ba22a44
 torvalds#15 [ffffa2021d70fd08] sock_ioctl at ffffffff9ba22d1c
 torvalds#16 [ffffa2021d70fd78] do_vfs_ioctl at ffffffff9b584cf4

Device is not present with no state bits set:

crash> net_device.state ffff8fff95240000
  state = 0x0,

Existing patch commit a699781 ("ethtool: check device is present
when getting link settings") fixes this in the modern sysfs reader's
ksettings path.

Fix this in the legacy ioctl path by checking for device presence as
well.

Fixes: 4bc71cb ("net: consolidate and fix ethtool_ops->get_settings calling")
Fixes: 3f1ac7a ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Tested-by: John J Coleman <[email protected]>
Co-developed-by: Jamie Bainbridge <[email protected]>
Signed-off-by: Jamie Bainbridge <[email protected]>
Signed-off-by: John J Coleman <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Feb 12, 2025
An ioctl caller of SIOCETHTOOL ETHTOOL_GSET can provoke the legacy
ethtool codepath on a non-present device, leading to kernel panic:

     [exception RIP: qed_get_current_link+0x11]
  torvalds#8 [ffffa2021d70f948] qede_get_link_ksettings at ffffffffc07bfa9a [qede]
  torvalds#9 [ffffa2021d70f9d0] __rh_call_get_link_ksettings at ffffffff9bad2723
 torvalds#10 [ffffa2021d70fa30] ethtool_get_settings at ffffffff9bad29d0
 torvalds#11 [ffffa2021d70fb18] __dev_ethtool at ffffffff9bad442b
 torvalds#12 [ffffa2021d70fc28] dev_ethtool at ffffffff9bad6db8
 torvalds#13 [ffffa2021d70fc60] dev_ioctl at ffffffff9ba7a55c
 torvalds#14 [ffffa2021d70fc98] sock_do_ioctl at ffffffff9ba22a44
 torvalds#15 [ffffa2021d70fd08] sock_ioctl at ffffffff9ba22d1c
 torvalds#16 [ffffa2021d70fd78] do_vfs_ioctl at ffffffff9b584cf4

Device is not present with no state bits set:

crash> net_device.state ffff8fff95240000
  state = 0x0,

Existing patch commit a699781 ("ethtool: check device is present
when getting link settings") fixes this in the modern sysfs reader's
ksettings path.

Fix this in the legacy ioctl path by checking for device presence as
well.

Fixes: 4bc71cb ("net: consolidate and fix ethtool_ops->get_settings calling")
Fixes: 3f1ac7a ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Tested-by: John J Coleman <[email protected]>
Co-developed-by: Jamie Bainbridge <[email protected]>
Signed-off-by: Jamie Bainbridge <[email protected]>
Signed-off-by: John J Coleman <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Feb 12, 2025
[ Upstream commit c7b87ce ]

libtraceevent parses and returns an array of argument fields, sometimes
larger than RAW_SYSCALL_ARGS_NUM (6) because it includes "__syscall_nr",
idx will traverse to index 6 (7th element) whereas sc->fmt->arg holds 6
elements max, creating an out-of-bounds access. This runtime error is
found by UBsan. The error message:

  $ sudo UBSAN_OPTIONS=print_stacktrace=1 ./perf trace -a --max-events=1
  builtin-trace.c:1966:35: runtime error: index 6 out of bounds for type 'syscall_arg_fmt [6]'
    #0 0x5c04956be5fe in syscall__alloc_arg_fmts /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:1966
    #1 0x5c04956c0510 in trace__read_syscall_info /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:2110
    #2 0x5c04956c372b in trace__syscall_info /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:2436
    #3 0x5c04956d2f39 in trace__init_syscalls_bpf_prog_array_maps /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:3897
    #4 0x5c04956d6d25 in trace__run /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:4335
    #5 0x5c04956e112e in cmd_trace /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:5502
    torvalds#6 0x5c04956eda7d in run_builtin /home/howard/hw/linux-perf/tools/perf/perf.c:351
    torvalds#7 0x5c04956ee0a8 in handle_internal_command /home/howard/hw/linux-perf/tools/perf/perf.c:404
    torvalds#8 0x5c04956ee37f in run_argv /home/howard/hw/linux-perf/tools/perf/perf.c:448
    torvalds#9 0x5c04956ee8e9 in main /home/howard/hw/linux-perf/tools/perf/perf.c:556
    torvalds#10 0x79eb3622a3b7 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    torvalds#11 0x79eb3622a47a in __libc_start_main_impl ../csu/libc-start.c:360
    torvalds#12 0x5c04955422d4 in _start (/home/howard/hw/linux-perf/tools/perf/perf+0x4e02d4) (BuildId: 5b6cab2d59e96a4341741765ad6914a4d784dbc6)

     0.000 ( 0.014 ms): Chrome_ChildIO/117244 write(fd: 238, buf: !, count: 1)                                      = 1

Fixes: 5e58fcf ("perf trace: Allow allocating sc->arg_fmt even without the syscall tracepoint")
Signed-off-by: Howard Chu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Feb 13, 2025
[ Upstream commit c7b87ce ]

libtraceevent parses and returns an array of argument fields, sometimes
larger than RAW_SYSCALL_ARGS_NUM (6) because it includes "__syscall_nr",
idx will traverse to index 6 (7th element) whereas sc->fmt->arg holds 6
elements max, creating an out-of-bounds access. This runtime error is
found by UBsan. The error message:

  $ sudo UBSAN_OPTIONS=print_stacktrace=1 ./perf trace -a --max-events=1
  builtin-trace.c:1966:35: runtime error: index 6 out of bounds for type 'syscall_arg_fmt [6]'
    #0 0x5c04956be5fe in syscall__alloc_arg_fmts /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:1966
    #1 0x5c04956c0510 in trace__read_syscall_info /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:2110
    #2 0x5c04956c372b in trace__syscall_info /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:2436
    #3 0x5c04956d2f39 in trace__init_syscalls_bpf_prog_array_maps /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:3897
    #4 0x5c04956d6d25 in trace__run /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:4335
    #5 0x5c04956e112e in cmd_trace /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:5502
    torvalds#6 0x5c04956eda7d in run_builtin /home/howard/hw/linux-perf/tools/perf/perf.c:351
    torvalds#7 0x5c04956ee0a8 in handle_internal_command /home/howard/hw/linux-perf/tools/perf/perf.c:404
    torvalds#8 0x5c04956ee37f in run_argv /home/howard/hw/linux-perf/tools/perf/perf.c:448
    torvalds#9 0x5c04956ee8e9 in main /home/howard/hw/linux-perf/tools/perf/perf.c:556
    torvalds#10 0x79eb3622a3b7 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    torvalds#11 0x79eb3622a47a in __libc_start_main_impl ../csu/libc-start.c:360
    torvalds#12 0x5c04955422d4 in _start (/home/howard/hw/linux-perf/tools/perf/perf+0x4e02d4) (BuildId: 5b6cab2d59e96a4341741765ad6914a4d784dbc6)

     0.000 ( 0.014 ms): Chrome_ChildIO/117244 write(fd: 238, buf: !, count: 1)                                      = 1

Fixes: 5e58fcf ("perf trace: Allow allocating sc->arg_fmt even without the syscall tracepoint")
Signed-off-by: Howard Chu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Feb 13, 2025
[ Upstream commit c7b87ce ]

libtraceevent parses and returns an array of argument fields, sometimes
larger than RAW_SYSCALL_ARGS_NUM (6) because it includes "__syscall_nr",
idx will traverse to index 6 (7th element) whereas sc->fmt->arg holds 6
elements max, creating an out-of-bounds access. This runtime error is
found by UBsan. The error message:

  $ sudo UBSAN_OPTIONS=print_stacktrace=1 ./perf trace -a --max-events=1
  builtin-trace.c:1966:35: runtime error: index 6 out of bounds for type 'syscall_arg_fmt [6]'
    #0 0x5c04956be5fe in syscall__alloc_arg_fmts /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:1966
    #1 0x5c04956c0510 in trace__read_syscall_info /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:2110
    #2 0x5c04956c372b in trace__syscall_info /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:2436
    #3 0x5c04956d2f39 in trace__init_syscalls_bpf_prog_array_maps /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:3897
    #4 0x5c04956d6d25 in trace__run /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:4335
    #5 0x5c04956e112e in cmd_trace /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:5502
    torvalds#6 0x5c04956eda7d in run_builtin /home/howard/hw/linux-perf/tools/perf/perf.c:351
    torvalds#7 0x5c04956ee0a8 in handle_internal_command /home/howard/hw/linux-perf/tools/perf/perf.c:404
    torvalds#8 0x5c04956ee37f in run_argv /home/howard/hw/linux-perf/tools/perf/perf.c:448
    torvalds#9 0x5c04956ee8e9 in main /home/howard/hw/linux-perf/tools/perf/perf.c:556
    torvalds#10 0x79eb3622a3b7 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    torvalds#11 0x79eb3622a47a in __libc_start_main_impl ../csu/libc-start.c:360
    torvalds#12 0x5c04955422d4 in _start (/home/howard/hw/linux-perf/tools/perf/perf+0x4e02d4) (BuildId: 5b6cab2d59e96a4341741765ad6914a4d784dbc6)

     0.000 ( 0.014 ms): Chrome_ChildIO/117244 write(fd: 238, buf: !, count: 1)                                      = 1

Fixes: 5e58fcf ("perf trace: Allow allocating sc->arg_fmt even without the syscall tracepoint")
Signed-off-by: Howard Chu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
RevySR added a commit to rockos-riscv/rockos-kernel that referenced this pull request Feb 14, 2025
riscv_noncoherent_supported should not be called multiple times
in the sifive_errata_probe function

[   72.659708] Unable to handle kernel paging request at virtual address ffffffff81389380
[   72.667661] Oops [#1]
[   72.669934] Modules linked in: es_iommu_rsv es_dev_dma_buf es_vdec es_rsvmem_heap es_buddy_driver nfnetlink ip_tables x_tables sha1_generic hmac ipv6 pvrsrvkm
[   72.684133] CPU: 0 PID: 696 Comm: modprobe Not tainted 6.6.36-win2030 torvalds#10
[   72.690922] Hardware name: ESWIN EIC7700 (DT)
[   72.695277] epc : riscv_noncoherent_supported+0x10/0x3e
[   72.700507]  ra : sifive_errata_patch_func+0x44/0x1e2
[   72.705560] epc : ffffffff8000bff6 ra : ffffffff8000c4da sp : ffff8f8008273ac0
[   72.712781]  gp : ffffffff81948e40 tp : ffffaf80b1115400 t0 : 0000000000000000
[   72.720001]  t1 : 0000000000000000 t2 : 0000000000000002 s0 : ffff8f8008273b30
[   72.727221]  s1 : ffffffff01d5e388 a0 : ffffffff01d5e388 a1 : ffffffff01d5e3c8
[   72.734441]  a2 : 8000000000000008 a3 : 0000000006220425 a4 : ffffffff81388ff2
[   72.741661]  a5 : 0000000000000001 a6 : 0000000000000006 a7 : 0000000000000010
[   72.748881]  s2 : ffffffff01d5e3c8 s3 : 8000000000000008 s4 : 0000000000000001
[   72.756101]  s5 : 0000000006220425 s6 : ffffffff8180f660 s7 : 0000000000000001
[   72.763321]  s8 : ffff8f8008273d58 s9 : ffffffff01dcb8c0 s10: ffffffff01dcb758
[   72.770542]  s11: 0000000000000000 t3 : ffffffffffffffff t4 : ffffffffffffffff
[   72.777761]  t5 : ffffffffffffffff t6 : ffffaf82a5ecc1c8
[   72.783071] status: 0000000200000120 badaddr: ffffffff81389380 cause: 000000000000000f
[   72.790986] [<ffffffff8000bff6>] riscv_noncoherent_supported+0x10/0x3e
[   72.797514] [<ffffffff800027f6>] _apply_alternatives+0x84/0x86
[   72.803348] [<ffffffff800029cc>] apply_module_alternatives+0x10/0x18
[   72.809701] [<ffffffff80007c8c>] module_finalize+0x5e/0x74
[   72.815193] [<ffffffff80083b54>] load_module+0x1110/0x18b4
[   72.820682] [<ffffffff80084496>] init_module_from_file+0x76/0xaa
[   72.826688] [<ffffffff800846e4>] __riscv_sys_finit_module+0x1e2/0x2a8
[   72.833129] [<ffffffff80a9ccbc>] do_trap_ecall_u+0xbe/0x130
[   72.838703] [<ffffffff80aa51f8>] ret_from_exception+0x0/0x64
[   72.844376] Code: 0009 b7ed e797 0193 a783 12e7 c799 4785 d717 0137 (0723) 38f7
[   72.851970] ---[ end trace 0000000000000000 ]---

Signed-off-by: Han Gao <[email protected]>
Signed-off-by: Han Gao <[email protected]>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Feb 19, 2025
[ Upstream commit c7b87ce ]

libtraceevent parses and returns an array of argument fields, sometimes
larger than RAW_SYSCALL_ARGS_NUM (6) because it includes "__syscall_nr",
idx will traverse to index 6 (7th element) whereas sc->fmt->arg holds 6
elements max, creating an out-of-bounds access. This runtime error is
found by UBsan. The error message:

  $ sudo UBSAN_OPTIONS=print_stacktrace=1 ./perf trace -a --max-events=1
  builtin-trace.c:1966:35: runtime error: index 6 out of bounds for type 'syscall_arg_fmt [6]'
    #0 0x5c04956be5fe in syscall__alloc_arg_fmts /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:1966
    #1 0x5c04956c0510 in trace__read_syscall_info /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:2110
    #2 0x5c04956c372b in trace__syscall_info /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:2436
    #3 0x5c04956d2f39 in trace__init_syscalls_bpf_prog_array_maps /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:3897
    #4 0x5c04956d6d25 in trace__run /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:4335
    #5 0x5c04956e112e in cmd_trace /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:5502
    torvalds#6 0x5c04956eda7d in run_builtin /home/howard/hw/linux-perf/tools/perf/perf.c:351
    torvalds#7 0x5c04956ee0a8 in handle_internal_command /home/howard/hw/linux-perf/tools/perf/perf.c:404
    torvalds#8 0x5c04956ee37f in run_argv /home/howard/hw/linux-perf/tools/perf/perf.c:448
    torvalds#9 0x5c04956ee8e9 in main /home/howard/hw/linux-perf/tools/perf/perf.c:556
    torvalds#10 0x79eb3622a3b7 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    torvalds#11 0x79eb3622a47a in __libc_start_main_impl ../csu/libc-start.c:360
    torvalds#12 0x5c04955422d4 in _start (/home/howard/hw/linux-perf/tools/perf/perf+0x4e02d4) (BuildId: 5b6cab2d59e96a4341741765ad6914a4d784dbc6)

     0.000 ( 0.014 ms): Chrome_ChildIO/117244 write(fd: 238, buf: !, count: 1)                                      = 1

Fixes: 5e58fcf ("perf trace: Allow allocating sc->arg_fmt even without the syscall tracepoint")
Signed-off-by: Howard Chu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Feb 20, 2025
When a bio with REQ_PREFLUSH is submitted to dm, __send_empty_flush()
generates a flush_bio with REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC,
which causes the flush_bio to be throttled by wbt_wait().

An example from v5.4, similar problem also exists in upstream:

    crash> bt 2091206
    PID: 2091206  TASK: ffff2050df92a300  CPU: 109  COMMAND: "kworker/u260:0"
     #0 [ffff800084a2f7f0] __switch_to at ffff80004008aeb8
     #1 [ffff800084a2f820] __schedule at ffff800040bfa0c4
     #2 [ffff800084a2f880] schedule at ffff800040bfa4b4
     #3 [ffff800084a2f8a0] io_schedule at ffff800040bfa9c4
     #4 [ffff800084a2f8c0] rq_qos_wait at ffff8000405925bc
     #5 [ffff800084a2f940] wbt_wait at ffff8000405bb3a0
     torvalds#6 [ffff800084a2f9a0] __rq_qos_throttle at ffff800040592254
     torvalds#7 [ffff800084a2f9c0] blk_mq_make_request at ffff80004057cf38
     torvalds#8 [ffff800084a2fa60] generic_make_request at ffff800040570138
     torvalds#9 [ffff800084a2fae0] submit_bio at ffff8000405703b4
    torvalds#10 [ffff800084a2fb50] xlog_write_iclog at ffff800001280834 [xfs]
    torvalds#11 [ffff800084a2fbb0] xlog_sync at ffff800001280c3c [xfs]
    torvalds#12 [ffff800084a2fbf0] xlog_state_release_iclog at ffff800001280df4 [xfs]
    torvalds#13 [ffff800084a2fc10] xlog_write at ffff80000128203c [xfs]
    torvalds#14 [ffff800084a2fcd0] xlog_cil_push at ffff8000012846dc [xfs]
    torvalds#15 [ffff800084a2fda0] xlog_cil_push_work at ffff800001284a2c [xfs]
    torvalds#16 [ffff800084a2fdb0] process_one_work at ffff800040111d08
    torvalds#17 [ffff800084a2fe00] worker_thread at ffff8000401121cc
    torvalds#18 [ffff800084a2fe70] kthread at ffff800040118de4

After commit 2def284 ("xfs: don't allow log IO to be throttled"),
the metadata submitted by xlog_write_iclog() should not be throttled.
But due to the existence of the dm layer, throttling flush_bio indirectly
causes the metadata bio to be throttled.

Fix this by conditionally adding REQ_IDLE to flush_bio.bi_opf, which makes
wbt_should_throttle() return false to avoid wbt_wait().

Signed-off-by: Jinliang Zheng <[email protected]>
Reviewed-by: Tianxiang Peng <[email protected]>
Reviewed-by: Hao Peng <[email protected]>
github-actions bot pushed a commit to anon503/linux that referenced this pull request Feb 25, 2025
When a bio with REQ_PREFLUSH is submitted to dm, __send_empty_flush()
generates a flush_bio with REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC,
which causes the flush_bio to be throttled by wbt_wait().

An example from v5.4, similar problem also exists in upstream:

    crash> bt 2091206
    PID: 2091206  TASK: ffff2050df92a300  CPU: 109  COMMAND: "kworker/u260:0"
     #0 [ffff800084a2f7f0] __switch_to at ffff80004008aeb8
     #1 [ffff800084a2f820] __schedule at ffff800040bfa0c4
     #2 [ffff800084a2f880] schedule at ffff800040bfa4b4
     #3 [ffff800084a2f8a0] io_schedule at ffff800040bfa9c4
     #4 [ffff800084a2f8c0] rq_qos_wait at ffff8000405925bc
     #5 [ffff800084a2f940] wbt_wait at ffff8000405bb3a0
     torvalds#6 [ffff800084a2f9a0] __rq_qos_throttle at ffff800040592254
     torvalds#7 [ffff800084a2f9c0] blk_mq_make_request at ffff80004057cf38
     torvalds#8 [ffff800084a2fa60] generic_make_request at ffff800040570138
     torvalds#9 [ffff800084a2fae0] submit_bio at ffff8000405703b4
    torvalds#10 [ffff800084a2fb50] xlog_write_iclog at ffff800001280834 [xfs]
    torvalds#11 [ffff800084a2fbb0] xlog_sync at ffff800001280c3c [xfs]
    torvalds#12 [ffff800084a2fbf0] xlog_state_release_iclog at ffff800001280df4 [xfs]
    torvalds#13 [ffff800084a2fc10] xlog_write at ffff80000128203c [xfs]
    torvalds#14 [ffff800084a2fcd0] xlog_cil_push at ffff8000012846dc [xfs]
    torvalds#15 [ffff800084a2fda0] xlog_cil_push_work at ffff800001284a2c [xfs]
    torvalds#16 [ffff800084a2fdb0] process_one_work at ffff800040111d08
    torvalds#17 [ffff800084a2fe00] worker_thread at ffff8000401121cc
    torvalds#18 [ffff800084a2fe70] kthread at ffff800040118de4

After commit 2def284 ("xfs: don't allow log IO to be throttled"),
the metadata submitted by xlog_write_iclog() should not be throttled.
But due to the existence of the dm layer, throttling flush_bio indirectly
causes the metadata bio to be throttled.

Fix this by conditionally adding REQ_IDLE to flush_bio.bi_opf, which makes
wbt_should_throttle() return false to avoid wbt_wait().

Signed-off-by: Jinliang Zheng <[email protected]>
Reviewed-by: Tianxiang Peng <[email protected]>
Reviewed-by: Hao Peng <[email protected]>
Signed-off-by: Mikulas Patocka <[email protected]>
1Naim pushed a commit to CachyOS/linux that referenced this pull request Mar 1, 2025
Luca Abeni reported this:
| BUG: scheduling while atomic: kworker/u8:2/15203/0x00000003
| CPU: 1 PID: 15203 Comm: kworker/u8:2 Not tainted 4.19.1-rt3 torvalds#10
| Call Trace:
|  rt_spin_lock+0x3f/0x50
|  gen6_read32+0x45/0x1d0 [i915]
|  g4x_get_vblank_counter+0x36/0x40 [i915]
|  trace_event_raw_event_i915_pipe_update_start+0x7d/0xf0 [i915]

The tracing events use trace_intel_pipe_update_start() among other events
use functions acquire spinlock_t locks which are transformed into
sleeping locks on PREEMPT_RT. A few trace points use
intel_get_crtc_scanline(), others use ->get_vblank_counter() wich also
might acquire a sleeping locks on PREEMPT_RT.
At the time the arguments are evaluated within trace point, preemption
is disabled and so the locks must not be acquired on PREEMPT_RT.

Based on this I don't see any other way than disable trace points on
PREMPT_RT.

Acked-by: Tvrtko Ursulin <[email protected]>
Reported-by: Luca Abeni <[email protected]>
Cc: Steven Rostedt <[email protected]>
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Mar 6, 2025
Ian told me that there are many memory leaks in the hierarchy mode.  I
can easily reproduce it with the follwing command.

  $ make DEBUG=1 EXTRA_CFLAGS=-fsanitize=leak

  $ perf record --latency -g -- ./perf test -w thloop

  $ perf report -H --stdio
  ...
  Indirect leak of 168 byte(s) in 21 object(s) allocated from:
      #0 0x7f3414c16c65 in malloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:75
      #1 0x55ed3602346e in map__get util/map.h:189
      #2 0x55ed36024cc4 in hist_entry__init util/hist.c:476
      #3 0x55ed36025208 in hist_entry__new util/hist.c:588
      #4 0x55ed36027c05 in hierarchy_insert_entry util/hist.c:1587
      #5 0x55ed36027e2e in hists__hierarchy_insert_entry util/hist.c:1638
      torvalds#6 0x55ed36027fa4 in hists__collapse_insert_entry util/hist.c:1685
      torvalds#7 0x55ed360283e8 in hists__collapse_resort util/hist.c:1776
      torvalds#8 0x55ed35de0323 in report__collapse_hists /home/namhyung/project/linux/tools/perf/builtin-report.c:735
      torvalds#9 0x55ed35de15b4 in __cmd_report /home/namhyung/project/linux/tools/perf/builtin-report.c:1119
      torvalds#10 0x55ed35de43dc in cmd_report /home/namhyung/project/linux/tools/perf/builtin-report.c:1867
      torvalds#11 0x55ed35e66767 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:351
      torvalds#12 0x55ed35e66a0e in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:404
      torvalds#13 0x55ed35e66b67 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:448
      torvalds#14 0x55ed35e66eb0 in main /home/namhyung/project/linux/tools/perf/perf.c:556
      torvalds#15 0x7f340ac33d67 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
  ...

  $ perf report -H --stdio 2>&1 | grep -c '^Indirect leak'
  93

I found that hist_entry__delete() missed to release child entries in the
hierarchy tree (hroot_{in,out}).  It needs to iterate the child entries
and call hist_entry__delete() recursively.

After this change:

  $ perf report -H --stdio 2>&1 | grep -c '^Indirect leak'
  0

Reported-by: Ian Rogers <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
RevySR added a commit to rockos-riscv/rockos-kernel that referenced this pull request Mar 6, 2025
riscv_noncoherent_supported should not be called multiple times
in the sifive_errata_probe function

[   72.659708] Unable to handle kernel paging request at virtual address ffffffff81389380
[   72.667661] Oops [#1]
[   72.669934] Modules linked in: es_iommu_rsv es_dev_dma_buf es_vdec es_rsvmem_heap es_buddy_driver nfnetlink ip_tables x_tables sha1_generic hmac ipv6 pvrsrvkm
[   72.684133] CPU: 0 PID: 696 Comm: modprobe Not tainted 6.6.36-win2030 torvalds#10
[   72.690922] Hardware name: ESWIN EIC7700 (DT)
[   72.695277] epc : riscv_noncoherent_supported+0x10/0x3e
[   72.700507]  ra : sifive_errata_patch_func+0x44/0x1e2
[   72.705560] epc : ffffffff8000bff6 ra : ffffffff8000c4da sp : ffff8f8008273ac0
[   72.712781]  gp : ffffffff81948e40 tp : ffffaf80b1115400 t0 : 0000000000000000
[   72.720001]  t1 : 0000000000000000 t2 : 0000000000000002 s0 : ffff8f8008273b30
[   72.727221]  s1 : ffffffff01d5e388 a0 : ffffffff01d5e388 a1 : ffffffff01d5e3c8
[   72.734441]  a2 : 8000000000000008 a3 : 0000000006220425 a4 : ffffffff81388ff2
[   72.741661]  a5 : 0000000000000001 a6 : 0000000000000006 a7 : 0000000000000010
[   72.748881]  s2 : ffffffff01d5e3c8 s3 : 8000000000000008 s4 : 0000000000000001
[   72.756101]  s5 : 0000000006220425 s6 : ffffffff8180f660 s7 : 0000000000000001
[   72.763321]  s8 : ffff8f8008273d58 s9 : ffffffff01dcb8c0 s10: ffffffff01dcb758
[   72.770542]  s11: 0000000000000000 t3 : ffffffffffffffff t4 : ffffffffffffffff
[   72.777761]  t5 : ffffffffffffffff t6 : ffffaf82a5ecc1c8
[   72.783071] status: 0000000200000120 badaddr: ffffffff81389380 cause: 000000000000000f
[   72.790986] [<ffffffff8000bff6>] riscv_noncoherent_supported+0x10/0x3e
[   72.797514] [<ffffffff800027f6>] _apply_alternatives+0x84/0x86
[   72.803348] [<ffffffff800029cc>] apply_module_alternatives+0x10/0x18
[   72.809701] [<ffffffff80007c8c>] module_finalize+0x5e/0x74
[   72.815193] [<ffffffff80083b54>] load_module+0x1110/0x18b4
[   72.820682] [<ffffffff80084496>] init_module_from_file+0x76/0xaa
[   72.826688] [<ffffffff800846e4>] __riscv_sys_finit_module+0x1e2/0x2a8
[   72.833129] [<ffffffff80a9ccbc>] do_trap_ecall_u+0xbe/0x130
[   72.838703] [<ffffffff80aa51f8>] ret_from_exception+0x0/0x64
[   72.844376] Code: 0009 b7ed e797 0193 a783 12e7 c799 4785 d717 0137 (0723) 38f7
[   72.851970] ---[ end trace 0000000000000000 ]---

Signed-off-by: Han Gao <[email protected]>
Signed-off-by: Han Gao <[email protected]>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Mar 7, 2025
Ian told me that there are many memory leaks in the hierarchy mode.  I
can easily reproduce it with the follwing command.

  $ make DEBUG=1 EXTRA_CFLAGS=-fsanitize=leak

  $ perf record --latency -g -- ./perf test -w thloop

  $ perf report -H --stdio
  ...
  Indirect leak of 168 byte(s) in 21 object(s) allocated from:
      #0 0x7f3414c16c65 in malloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:75
      #1 0x55ed3602346e in map__get util/map.h:189
      #2 0x55ed36024cc4 in hist_entry__init util/hist.c:476
      #3 0x55ed36025208 in hist_entry__new util/hist.c:588
      #4 0x55ed36027c05 in hierarchy_insert_entry util/hist.c:1587
      #5 0x55ed36027e2e in hists__hierarchy_insert_entry util/hist.c:1638
      torvalds#6 0x55ed36027fa4 in hists__collapse_insert_entry util/hist.c:1685
      torvalds#7 0x55ed360283e8 in hists__collapse_resort util/hist.c:1776
      torvalds#8 0x55ed35de0323 in report__collapse_hists /home/namhyung/project/linux/tools/perf/builtin-report.c:735
      torvalds#9 0x55ed35de15b4 in __cmd_report /home/namhyung/project/linux/tools/perf/builtin-report.c:1119
      torvalds#10 0x55ed35de43dc in cmd_report /home/namhyung/project/linux/tools/perf/builtin-report.c:1867
      torvalds#11 0x55ed35e66767 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:351
      torvalds#12 0x55ed35e66a0e in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:404
      torvalds#13 0x55ed35e66b67 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:448
      torvalds#14 0x55ed35e66eb0 in main /home/namhyung/project/linux/tools/perf/perf.c:556
      torvalds#15 0x7f340ac33d67 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
  ...

  $ perf report -H --stdio 2>&1 | grep -c '^Indirect leak'
  93

I found that hist_entry__delete() missed to release child entries in the
hierarchy tree (hroot_{in,out}).  It needs to iterate the child entries
and call hist_entry__delete() recursively.

After this change:

  $ perf report -H --stdio 2>&1 | grep -c '^Indirect leak'
  0

Reported-by: Ian Rogers <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
RevySR added a commit to rockos-riscv/rockos-kernel that referenced this pull request Mar 7, 2025
riscv_noncoherent_supported should not be called multiple times
in the sifive_errata_probe function

[   72.659708] Unable to handle kernel paging request at virtual address ffffffff81389380
[   72.667661] Oops [#1]
[   72.669934] Modules linked in: es_iommu_rsv es_dev_dma_buf es_vdec es_rsvmem_heap es_buddy_driver nfnetlink ip_tables x_tables sha1_generic hmac ipv6 pvrsrvkm
[   72.684133] CPU: 0 PID: 696 Comm: modprobe Not tainted 6.6.36-win2030 torvalds#10
[   72.690922] Hardware name: ESWIN EIC7700 (DT)
[   72.695277] epc : riscv_noncoherent_supported+0x10/0x3e
[   72.700507]  ra : sifive_errata_patch_func+0x44/0x1e2
[   72.705560] epc : ffffffff8000bff6 ra : ffffffff8000c4da sp : ffff8f8008273ac0
[   72.712781]  gp : ffffffff81948e40 tp : ffffaf80b1115400 t0 : 0000000000000000
[   72.720001]  t1 : 0000000000000000 t2 : 0000000000000002 s0 : ffff8f8008273b30
[   72.727221]  s1 : ffffffff01d5e388 a0 : ffffffff01d5e388 a1 : ffffffff01d5e3c8
[   72.734441]  a2 : 8000000000000008 a3 : 0000000006220425 a4 : ffffffff81388ff2
[   72.741661]  a5 : 0000000000000001 a6 : 0000000000000006 a7 : 0000000000000010
[   72.748881]  s2 : ffffffff01d5e3c8 s3 : 8000000000000008 s4 : 0000000000000001
[   72.756101]  s5 : 0000000006220425 s6 : ffffffff8180f660 s7 : 0000000000000001
[   72.763321]  s8 : ffff8f8008273d58 s9 : ffffffff01dcb8c0 s10: ffffffff01dcb758
[   72.770542]  s11: 0000000000000000 t3 : ffffffffffffffff t4 : ffffffffffffffff
[   72.777761]  t5 : ffffffffffffffff t6 : ffffaf82a5ecc1c8
[   72.783071] status: 0000000200000120 badaddr: ffffffff81389380 cause: 000000000000000f
[   72.790986] [<ffffffff8000bff6>] riscv_noncoherent_supported+0x10/0x3e
[   72.797514] [<ffffffff800027f6>] _apply_alternatives+0x84/0x86
[   72.803348] [<ffffffff800029cc>] apply_module_alternatives+0x10/0x18
[   72.809701] [<ffffffff80007c8c>] module_finalize+0x5e/0x74
[   72.815193] [<ffffffff80083b54>] load_module+0x1110/0x18b4
[   72.820682] [<ffffffff80084496>] init_module_from_file+0x76/0xaa
[   72.826688] [<ffffffff800846e4>] __riscv_sys_finit_module+0x1e2/0x2a8
[   72.833129] [<ffffffff80a9ccbc>] do_trap_ecall_u+0xbe/0x130
[   72.838703] [<ffffffff80aa51f8>] ret_from_exception+0x0/0x64
[   72.844376] Code: 0009 b7ed e797 0193 a783 12e7 c799 4785 d717 0137 (0723) 38f7
[   72.851970] ---[ end trace 0000000000000000 ]---

Signed-off-by: Han Gao <[email protected]>
Signed-off-by: Han Gao <[email protected]>
NeroReflex pushed a commit to proxmox-workstation/linux that referenced this pull request Mar 8, 2025
BugLink: https://bugs.launchpad.net/bugs/1942215

When the Timer operation is called, there are no arguments, and
acpi_ex_resolve_operands will be called with an out-of-bounds stack pointer
as num_operands is 0.

This does not usually cause any problems, as acpi_ex_resolve_operands will
ignore the parameter when the operation requires no arguments, as is the
case.

However, when the code is compiled with UBSAN, it will trigger, leading to
an oops with invalid opcode on Linux.

Fix it by using a NULL parameter when num_operands is 0.

[    8.285428] invalid opcode: 0000 [#1] SMP NOPTI
[    8.286436] CPU: 18 PID: 1522 Comm: systemd-udevd Not tainted 5.15.0-10-generic torvalds#10
[    8.287505] Hardware name: Intel Corporation S2600WFD/S2600WFD, BIOS SE5C620.86B.0D.01.0395.022720191340 02/27/2019
[    8.288495] RIP: 0010:acpi_ds_exec_end_op+0x1be/0x7a6
[    8.289658] Code: 7b 0a 48 89 da 44 89 45 d4 48 98 48 8d 34 c3 e8 f8 3c 01 00 44 8b 45 d4 85 c0 41 89 c6 75 22 eb 9e 44 89 c0 41 80 f8 0b 76 02 <0f> 0b 48 8b 04 c5 c0 c0 ca aa 48 89 df ff d0 0f 1f 00 41 89 c4 eb
[    8.291858] RSP: 0018:ffffc38561a3f6d0 EFLAGS: 00010286
[    8.292888] RAX: 0000000000000000 RBX: ffffa0aa87c91800 RCX: 0000000000000040
[    8.294056] RDX: ffffffffffffffff RSI: ffffffffaacabf40 RDI: 00000000000002cb
[    8.295839] RBP: ffffc38561a3f700 R08: 0000000000000000 R09: ffffa0aa9f5a1000
[    8.296030] IPMI message handler: version 39.2
[    8.297554] R10: ffffa0aa89cdec00 R11: 0000000000000003 R12: 0000000000000000
[    8.297556] R13: ffffa0aa9f5a10a0 R14: 0000000000000000 R15: 0000000000000000
[    8.297558] FS:  00007f68ba26b8c0(0000) GS:ffffa0d60ca80000(0000) knlGS:0000000000000000
[    8.297560] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    8.297561] CR2: 00007fdbb3b9eec8 CR3: 00000001176ba001 CR4: 00000000007706e0
[    8.297563] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    8.297564] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    8.297565] PKRU: 55555554
[    8.297566] Call Trace:
[    8.297569]  acpi_ps_parse_loop+0x587/0x660
[    8.297574]  acpi_ps_parse_aml+0x1af/0x552
[    8.297578]  acpi_ps_execute_method+0x208/0x2ca
[    8.297580]  acpi_ns_evaluate+0x34e/0x4f0
[    8.297583]  acpi_evaluate_object+0x18e/0x3b4
[    8.297587]  acpi_evaluate_dsm+0xb3/0x120
[    8.297593]  ? acpi_evaluate_dsm+0xb3/0x120
[    8.297596]  nfit_intel_shutdown_status+0xed/0x1a0 [nfit]
[    8.297606]  acpi_nfit_add_dimm+0x3cb/0x660 [nfit]
[    8.297614]  acpi_nfit_register_dimms+0x141/0x460 [nfit]
[    8.297620]  acpi_nfit_init+0x54f/0x620 [nfit]
[    8.327895]  acpi_nfit_add+0x18c/0x1f0 [nfit]
[    8.329341]  acpi_device_probe+0x49/0x170
[    8.330815]  really_probe+0x209/0x410
[    8.330820]  __driver_probe_device+0x109/0x180
[    8.330823]  driver_probe_device+0x23/0x90
[    8.330825]  __driver_attach+0xac/0x1b0
[    8.330828]  ? __device_attach_driver+0xe0/0xe0
[    8.330831]  bus_for_each_dev+0x7c/0xc0
[    8.330834]  driver_attach+0x1e/0x20
[    8.330835]  bus_add_driver+0x135/0x1f0
[    8.330837]  driver_register+0x95/0xf0
[    8.330840]  acpi_bus_register_driver+0x39/0x50
[    8.344874]  nfit_init+0x168/0x1000 [nfit]
[    8.344882]  ? 0xffffffffc0735000
[    8.344885]  do_one_initcall+0x46/0x1d0
[    8.350927]  ? kmem_cache_alloc_trace+0x18c/0x2c0
[    8.350933]  do_init_module+0x62/0x290
[    8.350940]  load_module+0xaa3/0xb30
[    8.350944]  __do_sys_finit_module+0xbf/0x120
[    8.350948]  __x64_sys_finit_module+0x18/0x20
[    8.350951]  do_syscall_64+0x59/0xc0
[    8.350955]  ? exit_to_user_mode_prepare+0x37/0xb0
[    8.350959]  ? irqentry_exit_to_user_mode+0x9/0x20
[    8.350963]  ? irqentry_exit+0x19/0x30
[    8.350965]  ? exc_page_fault+0x89/0x160
[    8.350968]  ? asm_exc_page_fault+0x8/0x30
[    8.350971]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[    8.350975] RIP: 0033:0x7f68ba7fc94d
[    8.350978] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b3 64 0f 00 f7 d8 64 89 01 48
[    8.350980] RSP: 002b:00007ffc7e0b93c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[    8.350984] RAX: ffffffffffffffda RBX: 000055bbb29a4a00 RCX: 00007f68ba7fc94d
[    8.350985] RDX: 0000000000000000 RSI: 00007f68ba9923fe RDI: 0000000000000006
[    8.350987] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000000
[    8.350988] R10: 0000000000000006 R11: 0000000000000246 R12: 00007f68ba9923fe
[    8.350989] R13: 000055bbb28e3a20 R14: 000055bbb297d940 R15: 000055bbb297ea60

Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Acked-by: Andrea Righi <[email protected]>
Signed-off-by: Paolo Pisati <[email protected]>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Mar 10, 2025
Ian told me that there are many memory leaks in the hierarchy mode.  I
can easily reproduce it with the follwing command.

  $ make DEBUG=1 EXTRA_CFLAGS=-fsanitize=leak

  $ perf record --latency -g -- ./perf test -w thloop

  $ perf report -H --stdio
  ...
  Indirect leak of 168 byte(s) in 21 object(s) allocated from:
      #0 0x7f3414c16c65 in malloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:75
      #1 0x55ed3602346e in map__get util/map.h:189
      #2 0x55ed36024cc4 in hist_entry__init util/hist.c:476
      #3 0x55ed36025208 in hist_entry__new util/hist.c:588
      #4 0x55ed36027c05 in hierarchy_insert_entry util/hist.c:1587
      #5 0x55ed36027e2e in hists__hierarchy_insert_entry util/hist.c:1638
      torvalds#6 0x55ed36027fa4 in hists__collapse_insert_entry util/hist.c:1685
      torvalds#7 0x55ed360283e8 in hists__collapse_resort util/hist.c:1776
      torvalds#8 0x55ed35de0323 in report__collapse_hists /home/namhyung/project/linux/tools/perf/builtin-report.c:735
      torvalds#9 0x55ed35de15b4 in __cmd_report /home/namhyung/project/linux/tools/perf/builtin-report.c:1119
      torvalds#10 0x55ed35de43dc in cmd_report /home/namhyung/project/linux/tools/perf/builtin-report.c:1867
      torvalds#11 0x55ed35e66767 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:351
      torvalds#12 0x55ed35e66a0e in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:404
      torvalds#13 0x55ed35e66b67 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:448
      torvalds#14 0x55ed35e66eb0 in main /home/namhyung/project/linux/tools/perf/perf.c:556
      torvalds#15 0x7f340ac33d67 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
  ...

  $ perf report -H --stdio 2>&1 | grep -c '^Indirect leak'
  93

I found that hist_entry__delete() missed to release child entries in the
hierarchy tree (hroot_{in,out}).  It needs to iterate the child entries
and call hist_entry__delete() recursively.

After this change:

  $ perf report -H --stdio 2>&1 | grep -c '^Indirect leak'
  0

Reported-by: Ian Rogers <[email protected]>
Tested-by Thomas Falcon <[email protected]>
Reviewed-by: Ian Rogers <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Mar 11, 2025
The env.pmu_mapping can be leaked when it reads data from a pipe on AMD.
For a pipe data, it reads the header data including pmu_mapping from
PERF_RECORD_HEADER_FEATURE runtime.  But it's already set in:

  perf_session__new()
    __perf_session__new()
      evlist__init_trace_event_sample_raw()
        evlist__has_amd_ibs()
          perf_env__nr_pmu_mappings()

Then it'll overwrite that when it processes the HEADER_FEATURE record.
Here's a report from address sanitizer.

  Direct leak of 2689 byte(s) in 1 object(s) allocated from:
    #0 0x7fed8f814596 in realloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:98
    #1 0x5595a7d416b1 in strbuf_grow util/strbuf.c:64
    #2 0x5595a7d414ef in strbuf_init util/strbuf.c:25
    #3 0x5595a7d0f4b7 in perf_env__read_pmu_mappings util/env.c:362
    #4 0x5595a7d12ab7 in perf_env__nr_pmu_mappings util/env.c:517
    #5 0x5595a7d89d2f in evlist__has_amd_ibs util/amd-sample-raw.c:315
    torvalds#6 0x5595a7d87fb2 in evlist__init_trace_event_sample_raw util/sample-raw.c:23
    torvalds#7 0x5595a7d7f893 in __perf_session__new util/session.c:179
    torvalds#8 0x5595a7b79572 in perf_session__new util/session.h:115
    torvalds#9 0x5595a7b7e9dc in cmd_report builtin-report.c:1603
    torvalds#10 0x5595a7c019eb in run_builtin perf.c:351
    torvalds#11 0x5595a7c01c92 in handle_internal_command perf.c:404
    torvalds#12 0x5595a7c01deb in run_argv perf.c:448
    torvalds#13 0x5595a7c02134 in main perf.c:556
    torvalds#14 0x7fed85833d67 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58

Let's free the existing pmu_mapping data if any.

Cc: Ravi Bangoria <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants