Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

up #5

Merged
merged 1,117 commits into from
Apr 26, 2018
Merged

up #5

merged 1,117 commits into from
Apr 26, 2018

Conversation

rahrawat
Copy link
Owner

No description provided.

Finn Thain and others added 30 commits April 16, 2018 21:49
The 'eject' shell command may send various different ioctl commands.
This leads to error messages on the console even though the FDEJECT
ioctl succeeds.

~# eject floppy
SWIM floppy_ioctl: unknown cmd 21257
SWIM floppy_ioctl: unknown cmd 1

Don't log an error message for an invalid ioctl, just do as the
swim3 driver does and return -ENOTTY.

Cc: Laurent Vivier <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: [email protected] # v4.14+
Tested-by: Stan Johnson <[email protected]>
Signed-off-by: Finn Thain <[email protected]>
Acked-by: Laurent Vivier <[email protected]>
Reviewed-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
The Sony drive status bits use active-low logic. The swim_readbit()
function converts that to 'C' logic for readability. Hence, the
sense of the names of the status bit macros should not be inverted.

Mostly they are correct. However, the TWOMEG_DRIVE, MFM_MODE and
TWOMEG_MEDIA macros have inverted sense (like MkLinux). Fix this
inconsistency and make the following patches less confusing.

The same problem affects swim3.c so fix that too.

No functional change.

The FDHD drive status bits are documented in sonydriv.cpp from MAME
and in swimiii.h from MkLinux.

Cc: Laurent Vivier <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: [email protected]
Cc: Jens Axboe <[email protected]>
Cc: [email protected] # v4.14+
Tested-by: Stan Johnson <[email protected]>
Signed-off-by: Finn Thain <[email protected]>
Acked-by: Laurent Vivier <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
The SWIM chip is compatible with GCR-mode Sony 400K/800K drives but
this driver only supports MFM mode. Therefore only Sony FDHD drives
are supported. Skip incompatible drives.

Cc: Laurent Vivier <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: [email protected] # v4.14+
Tested-by: Stan Johnson <[email protected]>
Signed-off-by: Finn Thain <[email protected]>
Acked-by: Laurent Vivier <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Reading to the end of a 720K disk results in an IO error instead of EOF
because the block layer thinks the disk has 2880 sectors. (Partly this
is a result of inverted logic of the ONEMEG_MEDIA bit that's now fixed.)

Initialize the density and head count in swim_add_floppy() to agree
with the device size passed to set_capacity() during drive probe.

Call set_capacity() again upon device open, after refreshing the density
and head count values.

Cc: Laurent Vivier <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: [email protected] # v4.14+
Tested-by: Stan Johnson <[email protected]>
Signed-off-by: Finn Thain <[email protected]>
Acked-by: Laurent Vivier <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
The driver supports internal and external FDD units so the floppy_open
function must not hard-code the drive location.

Cc: Laurent Vivier <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: [email protected] # v4.14+
Tested-by: Stan Johnson <[email protected]>
Signed-off-by: Finn Thain <[email protected]>
Acked-by: Laurent Vivier <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
rq->gstate and rq->aborted_gstate both are zero before rqs are
allocated. If we have a small timeout, when the timer fires,
there could be rqs that are never allocated, and also there could
be rq that has been allocated but not initialized and started. At
the moment, the rq->gstate and rq->aborted_gstate both are 0, thus
the blk_mq_terminate_expired will identify the rq is timed out and
invoke .timeout early.

For scsi, this will cause scsi_times_out to be invoked before the
scsi_cmnd is not initialized, scsi_cmnd->device is still NULL at
the moment, then we will get crash.

Cc: Bart Van Assche <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Ming Lei <[email protected]>
Cc: Martin Steigerwald <[email protected]>
Cc: [email protected]
Signed-off-by: Jianchao Wang <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
This warning message is not very helpful, as the return value should
already show information about the error. Also, this message will
spam dmesg if the user space does testing in a loop, like:

    for x in {0..5}
    do
        echo p:xx xx+$x >> /sys/kernel/debug/tracing/kprobe_events
    done

Reported-by: Vince Weaver <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Since drm_framebuffer can now store GEM objects directly, place them
there rather than in our own subclass. As this makes the framebuffer
create_handle and destroy functions the same as the GEM framebuffer
helper, we can reuse those.

Signed-off-by: Daniel Stone <[email protected]>
Signed-off-by: Inki Dae <[email protected]>
Cc: Inki Dae <[email protected]>
Cc: Joonyoung Shim <[email protected]>
Cc: Seung-Woo Kim <[email protected]>
Cc: Kyungmin Park <[email protected]>
This can be calculated from the GEM BO DMA address as well as the offset
stored in the base framebuffer.

Signed-off-by: Daniel Stone <[email protected]>
Signed-off-by: Inki Dae <[email protected]>
Cc: Inki Dae <[email protected]>
Cc: Joonyoung Shim <[email protected]>
Cc: Seung-Woo Kim <[email protected]>
Cc: Kyungmin Park <[email protected]>
Now exynos_drm_fb is just an empty wrapper around drm_framebuffer, we
can drop it.

Signed-off-by: Daniel Stone <[email protected]>
Signed-off-by: Inki Dae <[email protected]>
Cc: Inki Dae <[email protected]>
Cc: Joonyoung Shim <[email protected]>
Cc: Seung-Woo Kim <[email protected]>
Cc: Kyungmin Park <[email protected]>
It may be useful to compile host programs with different flags (e.g.
hardening). Ensure that objtool picks up the appropriate flags.

Signed-off-by: Laura Abbott <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/05a360681176f1423cb2fde8faae3a0a0261afc5.1523560825.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <[email protected]>
The struct sigaction for user space in arch/s390/include/uapi/asm/signal.h
is ill defined. The kernel uses two structures 'struct sigaction' and
'struct old_sigaction', the correlation in the kernel for both 31 and
64 bit is as follows

    sys_sigaction -> struct old_sigaction
    sys_rt_sigaction -> struct sigaction

The correlation of the (single) uapi definition for 'struct sigaction'
under '#ifndef __KERNEL__':

    31-bit: sys_sigaction -> uapi struct sigaction
    31-bit: sys_rt_sigaction -> no structure available

    64-bit: sys_sigaction -> no structure available
    64-bit: sys_rt_sigaction -> uapi struct sigaction

This is quite confusing. To make it a bit less confusing make the
uapi definition of 'struct sigaction' usable for sys_rt_sigaction for
both 31-bit and 64-bit.

Signed-off-by: Martin Schwidefsky <[email protected]>
After merging the netfilter tree, today's linux-next build (powerpc
ppc64_defconfig) failed like this:

net/netfilter/nf_conntrack_extend.c: In function 'nf_ct_ext_add':
net/netfilter/nf_conntrack_extend.c:74:2: error: implicit declaration of function 'kmemleak_not_leak' [-Werror=implicit-function-declaration]
  kmemleak_not_leak(old);
  ^~~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors

Fixes: 114aa35 ("netfilter: conntrack: silent a memory leak warning")
Signed-off-by: Stephen Rothwell <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
If there is no d-cache-size property in the device tree, l1d_size could
be zero. We don't actually expect that to happen, it's only been seen
on mambo (simulator) in some configurations.

A zero-size l1d_size leads to the loop in the asm wrapping around to
2^64-1, and then walking off the end of the fallback area and
eventually causing a page fault which is fatal.

Just default to 64K which is correct on some CPUs, and sane enough to
not cause a crash on others.

Fixes: aa8a5e0 ('powerpc/64s: Add support for RFI flush of L1-D cache')
Signed-off-by: Madhavan Srinivasan <[email protected]>
[mpe: Rewrite comment and change log]
Signed-off-by: Michael Ellerman <[email protected]>
The commit that switched x86 to dma_direct_ops stopped using and building
this file, but accidentally left it in the tree.  Remove it.

Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
The TSC calibration code uses HPET as reference. The conversion normalizes
the delta of two HPET timestamps:

    hpetref = ((tshpet1 - tshpet2) * HPET_PERIOD) / 1e6

and then divides the normalized delta of the corresponding TSC timestamps
by the result to calulate the TSC frequency.

    tscfreq = ((tstsc1 - tstsc2 ) * 1e6) / hpetref

This uses do_div() which takes an u32 as the divisor, which worked so far
because the HPET frequency was low enough that 'hpetref' never exceeded
32bit.

On Skylake machines the HPET frequency increased so 'hpetref' can exceed
32bit. do_div() truncates the divisor, which causes the calibration to
fail.

Use div64_u64() to avoid the problem.

[ tglx: Fixes whitespace mangled patch and rewrote changelog ]

Signed-off-by: Xiaoming Gao <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
RongQing reported that there are some X2APIC id 0xffffffff in his machine's
ACPI MADT table, which makes the number of possible CPU inaccurate.

The reason is that the ACPI X2APIC parser has no sanity check for APIC ID
0xffffffff, which is an invalid id in all APIC types. See "Intel® 64
Architecture x2APIC Specification", Chapter 2.4.1.

Add a sanity check to acpi_parse_x2apic() which ignores the invalid id.

Reported-by: Li RongQing <[email protected]>
Signed-off-by: Dou Liyang <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
early_trap_init() and cpu_set_gdt() have been removed, so remove the stale
declarations as well.

Signed-off-by: Dou Liyang <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
The existing API allows to pass a sample data to initialize the shadow
data. It works well when the data are position independent. But it fails
miserably when we need to set a pointer to the shadow structure itself.

Unfortunately, we might need to initialize the pointer surprisingly
often because of struct list_head. It is even worse because the list
might be hidden in other common structures, for example, struct mutex,
struct wait_queue_head.

For example, this was needed to fix races in ALSA sequencer. It required
to add mutex into struct snd_seq_client. See commit b3defb7
("ALSA: seq: Make ioctls race-free") and commit d15d662
("ALSA: seq: Fix racy pool initializations")

This patch makes the API more safe. A custom constructor function and data
are passed to klp_shadow_*alloc() functions instead of the sample data.

Note that ctor_data are no longer a template for shadow->data. It might
point to any data that might be necessary when the constructor is called.

Also note that the constructor is called under klp_shadow_lock. It is
an internal spin_lock that synchronizes alloc() vs. get() operations,
see klp_shadow_get_or_alloc(). On one hand, this adds a risk of ABBA
deadlocks. On the other hand, it allows to do some operations safely.
For example, we could add the new structure into an existing list.
This must be done only once when the structure is allocated.

Reported-by: Nicolai Stange <[email protected]>
Signed-off-by: Petr Mladek <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
Acked-by: Miroslav Benes <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>
We might need to do some actions before the shadow variable is freed.
For example, we might need to remove it from a list or free some data
that it points to.

This is already possible now. The user can get the shadow variable
by klp_shadow_get(), do the necessary actions, and then call
klp_shadow_free().

This patch allows to do it a more elegant way. The user could implement
the needed actions in a callback that is passed to klp_shadow_free()
as a parameter. The callback usually does reverse operations to
the constructor callback that can be called by klp_shadow_*alloc().

It is especially useful for klp_shadow_free_all(). There we need to do
these extra actions for each found shadow variable with the given ID.

Note that the memory used by the shadow variable itself is still released
later by rcu callback. It is needed to protect internal structures that
keep all shadow variables. But the destructor is called immediately.
The shadow variable must not be access anyway after klp_shadow_free()
is called. The user is responsible to protect this any suitable way.

Be aware that the destructor is called under klp_shadow_lock. It is
the same as for the contructor in klp_shadow_alloc().

Signed-off-by: Petr Mladek <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
Acked-by: Miroslav Benes <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>
This is the sync up with the canonical definition of the sound
protocol in Xen:

1. Protocol version was referenced in the protocol description,
   but missed its definition. Fixed by adding a constant
   for current protocol version.

2. Some of the request descriptions have "reserved" fields
   missed: fixed by adding corresponding entries.

3. Extend the size of the requests and responses to 64 octets.
   Bump protocol version to 2.

4. Add explicit back and front synchronization
   In order to provide explicit synchronization between backend and
   frontend the following changes are introduced in the protocol:
    - add new ring buffer for sending asynchronous events from
      backend to frontend to report number of bytes played by the
      frontend (XENSND_EVT_CUR_POS)
    - introduce trigger events for playback control: start/stop/pause/resume
    - add "req-" prefix to event-channel and ring-ref to unify naming
      of the Xen event channels for requests and events

5. Add explicit back and front parameter negotiation
   In order to provide explicit stream parameter negotiation between
   backend and frontend the following changes are introduced in the protocol:
   add XENSND_OP_HW_PARAM_QUERY request to read/update
   configuration space for the parameters given: request passes
   desired parameter's intervals/masks and the response to this request
   returns allowed min/max intervals/masks to be used.

Signed-off-by: Oleksandr Andrushchenko <[email protected]>
Signed-off-by: Oleksandr Grytsov <[email protected]>
Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
Reviewed-by: Boris Ostrovsky <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Takashi Iwai <[email protected]>
Signed-off-by: Boris Ostrovsky <[email protected]>
xenbus_command_reply() did not actually copy the response string and
leaked stack content instead.

Fixes: 9a6161f ("xen: return xenstore command failures via response instead of rc")
Signed-off-by: Simon Gaiser <[email protected]>
Reviewed-by: Juergen Gross <[email protected]>
Signed-off-by: Boris Ostrovsky <[email protected]>
Sync the following tooling headers with the latest kernel version:

  tools/arch/arm/include/uapi/asm/kvm.h
    - New ABI: KVM_REG_ARM_*

  tools/arch/x86/include/asm/required-features.h
    - Removal of NEED_LA57 dependency

  tools/arch/x86/include/uapi/asm/kvm.h
    - New KVM ABI: KVM_SYNC_X86_*

  tools/include/uapi/asm-generic/mman-common.h
    - New ABI: MAP_FIXED_NOREPLACE flag

  tools/include/uapi/linux/bpf.h
    - New ABI: BPF_F_SEQ_NUMBER functions

  tools/include/uapi/linux/if_link.h
    - New ABI: IFLA tun and rmnet support

  tools/include/uapi/linux/kvm.h
    - New ABI: hyperv eventfd and CONN_ID_MASK support plus header cleanups

  tools/include/uapi/sound/asound.h
    - New ABI: SNDRV_PCM_FORMAT_FIRST PCM format specifier

  tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
    - The x86 system call table description changed due to the ptregs changes and the renames, in:

	d5a0052: syscalls/core, syscalls/x86: Rename struct pt_regs-based sys_*() to __x64_sys_*()
	5ac9efa: syscalls/core, syscalls/x86: Clean up compat syscall stub naming convention
	ebeb8c8: syscalls/x86: Use 'struct pt_regs' based syscall calling for IA32_EMULATION and x32

Also fix the x86 syscall table warning:

  -Warning: Kernel ABI header at 'tools/arch/x86/entry/syscalls/syscall_64.tbl' differs from latest version at 'arch/x86/entry/syscalls/syscall_64.tbl'
  +Warning: Kernel ABI header at 'tools/perf/arch/x86/entry/syscalls/syscall_64.tbl' differs from latest version at 'arch/x86/entry/syscalls/syscall_64.tbl'

None of these changes impact existing tooling code, so we only have to copy the kernel version.

Signed-off-by: Ingo Molnar <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Alexey Budankov <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Brian Robbins <[email protected]>
Cc: Clark Williams <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Dmitriy Vyukov <[email protected]> <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Hendrik Brueckner <[email protected]>
Cc: Jesper Dangaard Brouer <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Kim Phillips <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Li Zhijian <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Martin Liška <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Matthias Kaehlcke <[email protected]>
Cc: Miguel Bernal Marin <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Naveen N. Rao <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: Sandipan Das <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Cc: Takuya Yamamoto <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Thomas Richter <[email protected]>
Cc: Wang Nan <[email protected]>
Cc: William Cohen <[email protected]>
Cc: Yonghong Song <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
…IDE]

Store preempting context switch out event into Perf trace as a part of
PERF_RECORD_SWITCH[_CPU_WIDE] record.

Percentage of preempting and non-preempting context switches help
understanding the nature of workloads (CPU or IO bound) that are running
on a machine;

The event is treated as preemption one when task->state value of the
thread being switched out is TASK_RUNNING. Event type encoding is
implemented using PERF_RECORD_MISC_SWITCH_OUT_PREEMPT bit;

Signed-off-by: Alexey Budankov <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Print additional 'preempt' tag for PERF_RECORD_SWITCH[_CPU_WIDE] OUT records when
event header misc field contains PERF_RECORD_MISC_SWITCH_OUT_PREEMPT bit set
designating preemption context switch out event:

tools/perf/perf report -D -i perf.data | grep _SWITCH

0 768361415226 0x27f076 [0x28]: PERF_RECORD_SWITCH_CPU_WIDE IN           prev pid/tid:     8/8
4 768362216813 0x28f45e [0x28]: PERF_RECORD_SWITCH_CPU_WIDE OUT          next pid/tid:     0/0
4 768362217824 0x28f486 [0x28]: PERF_RECORD_SWITCH_CPU_WIDE IN           prev pid/tid:  4073/4073
0 768362414027 0x27f0ce [0x28]: PERF_RECORD_SWITCH_CPU_WIDE OUT preempt  next pid/tid:     8/8
0 768362414367 0x27f0f6 [0x28]: PERF_RECORD_SWITCH_CPU_WIDE IN           prev pid/tid:     0/0

Signed-off-by: Alexey Budankov <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Append 'p' sign to 'S' tag designating the type of context switch out event so
'Sp' means preemption context switch. Documentation is extended to cover
new presentation changes.

  $ perf script --show-switch-events -F +misc -I -i perf.data:

          hdparm 4073 [004] U  762.198265:     380194 cycles:ppp:      7faf727f5a23 strchr (/usr/lib64/ld-2.26.so)
          hdparm 4073 [004] K  762.198366:     441572 cycles:ppp:  ffffffffb9218435 alloc_set_pte (/lib/modules/4.16.0-rc6+/build/vmlinux)
          hdparm 4073 [004] S  762.198391: PERF_RECORD_SWITCH_CPU_WIDE OUT          next pid/tid:    0/0
         swapper    0 [004]    762.198392: PERF_RECORD_SWITCH_CPU_WIDE IN           prev pid/tid: 4073/4073
         swapper    0 [004] Sp 762.198477: PERF_RECORD_SWITCH_CPU_WIDE OUT preempt  next pid/tid: 4073/4073
          hdparm 4073 [004]    762.198478: PERF_RECORD_SWITCH_CPU_WIDE IN           prev pid/tid:    0/0
         swapper    0 [007] K  762.198514:    2303073 cycles:ppp:  ffffffffb98b0c66 intel_idle (/lib/modules/4.16.0-rc6+/build/vmlinux)
         swapper    0 [007] Sp 762.198561: PERF_RECORD_SWITCH_CPU_WIDE OUT preempt  next pid/tid: 1134/1134
  kworker/u16:18 1134 [007]    762.198562: PERF_RECORD_SWITCH_CPU_WIDE IN           prev pid/tid:    0/0
  kworker/u16:18 1134 [007] S  762.198567: PERF_RECORD_SWITCH_CPU_WIDE OUT          next pid/tid:    0/0

Signed-off-by: Alexey Budankov <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
'perf list' with flags -d and -v print a description (-d) or a very
verbose explanation (-v) of CPU specific counter events.  These
descriptions are provided with the json files in directory
pmu-events/arch/s390/*.json.

Display of these descriptions on s390 requires the corresponding json
files.

On s390 this does not work because function is_pmu_core() does not
detect the s390 directory name where the CPU specific events are listed.
On x86 it is:

  /sys/bus/event_source/devices/cpu

whereas on s390 it is:

  /sys/bus/event_source/devices/cpum_cf
  /sys/bus/event_source/devices/cpum_sf

Fix this by adding s390 directory name testing to function
is_pmu_core(). This is the same approach as taken for the ARM platform.

Output before:

[root@s35lp76 perf]# ./perf list -d pmu
List of pre-defined events (to be used in -e):

  cpum_cf/AES_BLOCKED_CYCLES/      [Kernel PMU event]
  cpum_cf/AES_BLOCKED_FUNCTIONS/   [Kernel PMU event]
  cpum_cf/AES_CYCLES/              [Kernel PMU event]
  cpum_cf/AES_FUNCTIONS/           [Kernel PMU event]
  ....
  cpum_cf/TX_NC_TEND/              [Kernel PMU event]
  cpum_cf/VX_BCD_EXECUTION_SLOTS/  [Kernel PMU event]
  cpum_sf/SF_CYCLES_BASIC/         [Kernel PMU event]

Output after:

[root@s35lp76 perf]# ./perf list -d pmu
List of pre-defined events (to be used in -e):

  cpum_cf/AES_BLOCKED_CYCLES/      [Kernel PMU event]
  cpum_cf/AES_BLOCKED_FUNCTIONS/   [Kernel PMU event]
  cpum_cf/AES_CYCLES/              [Kernel PMU event]
  cpum_cf/AES_FUNCTIONS/           [Kernel PMU event]
  ....
  cpum_cf/TX_NC_TEND/              [Kernel PMU event]
  cpum_cf/VX_BCD_EXECUTION_SLOTS/  [Kernel PMU event]
  cpum_sf/SF_CYCLES_BASIC/         [Kernel PMU event]

3906:
  bcd_dfp_execution_slots
       [BCD DFP Execution Slots]
  decimal_instructions
       [Decimal Instructions]
  dtlb2_gpage_writes
       [DTLB2 GPAGE Writes]
  dtlb2_hpage_writes
       [DTLB2 HPAGE Writes]
  dtlb2_misses
       [DTLB2 Misses]
  dtlb2_writes
       [DTLB2 Writes]
  itlb2_misses
       [ITLB2 Misses]
  itlb2_writes
       [ITLB2 Writes]
  l1c_tlb2_misses
       [L1C TLB2 Misses]
  .....

cfvn 3:
  cpu_cycles
       [CPU Cycles]
  instructions
       [Instructions]
  l1d_dir_writes
       [L1D Directory Writes]
  l1d_penalty_cycles
       [L1D Penalty Cycles]
  l1i_dir_writes
       [L1I Directory Writes]
  l1i_penalty_cycles
       [L1I Penalty Cycles]
  problem_state_cpu_cycles
       [Problem State CPU Cycles]
  problem_state_instructions
       [Problem State Instructions]
  ....

csvn generic:
  aes_blocked_cycles
       [AES Blocked Cycles]
  aes_blocked_functions
       [AES Blocked Functions]
  aes_cycles
       [AES Cycles]
  aes_functions
       [AES Functions]
  dea_blocked_cycles
       [DEA Blocked Cycles]
  dea_blocked_functions
       [DEA Blocked Functions]
  ....

Signed-off-by: Thomas Richter <[email protected]>
Reviewed-by: Hendrik Brueckner <[email protected]>
Acked-by: Mark Rutland <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Return immediately when we find issue in the user stack checks. The
error value could get overwritten by following check for
PERF_SAMPLE_REGS_INTR.

Signed-off-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Fixes: 60e2364 ("perf: Add ability to sample machine state on interrupt")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
The syzbot hit KASAN bug in perf_callchain_store having the entry stored
behind the allocated bounds [1].

We miss the sample_max_stack check for the initial event that allocates
callchain buffers. This missing check allows to create an event with
sample_max_stack value bigger than the global sysctl maximum:

  # sysctl -a | grep perf_event_max_stack
  kernel.perf_event_max_stack = 127

  # perf record -vv -C 1 -e cycles/max-stack=256/ kill
  ...
  perf_event_attr:
    size                             112
    ...
    sample_max_stack                 256
  ------------------------------------------------------------
  sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 4

Note the '-C 1', which forces perf record to create just single event.
Otherwise it opens event for every cpu, then the sample_max_stack check
fails on the second event and all's fine.

The fix is to run the sample_max_stack check also for the first event
with callchains.

[1] https://marc.info/?l=linux-kernel&m=152352732920874&w=2

Reported-by: [email protected]
Signed-off-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Fixes: 97c79a3 ("perf core: Per event callchain limit")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
If the get_callchain_buffers fails to allocate the buffer it will
decrease the nr_callchain_events right away.

There's no point of checking the allocation error for
nr_callchain_events > 1. Removing that check.

Signed-off-by: Jiri Olsa <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
davem330 and others added 26 commits April 23, 2018 21:24
aTom Lendacky says:

====================
amd-xgbe: AMD XGBE driver fixes 2018-04-23

This patch series addresses some issues in the AMD XGBE driver.

The following fixes are included in this driver update series:

- Improve KR auto-negotiation and training (2 patches)
  - Add pre and post auto-negotiation hooks
  - Use the pre and post auto-negotiation hooks to disable CDR tracking
    during auto-negotiation page exchange in KR mode
- Check for SFP tranceiver signal support and only use the signal if the
  SFP indicates that it is supported

This patch series is based on net.
====================

Signed-off-by: David S. Miller <[email protected]>
The same fix in Commit dbe1730 ("bridge: fix netconsole
setup over bridge") is also needed for team driver.

While at it, remove the unnecessary parameter *team from
team_port_enable_netpoll().

v1->v2:
  - fix it in a better way, as does bridge.

Fixes: 0fb52a2 ("team: cleanup netpoll clode")
Reported-by: João Avelino Bellomo Filho <[email protected]>
Signed-off-by: Xin Long <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
The current error handling for failed resource setup for xdp_ring
data is a break out of the loop and returning 0 indicated everything
was OK, when in fact it is not.  Fix this by exiting via the
error exit label err_setup_tx that will clean up the resources
correctly and return and error status.

Detected by CoverityScan, CID#1466879 ("Logically dead code")

Fixes: 21092e9 ("ixgbevf: Add support for XDP_TX action")
Signed-off-by: Colin Ian King <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>
When Qav mode is enabled, queue 0 should be kept on Stream Reservation
mode. From the i210 datasheet, section 8.12.19:

"Note: Queue0 QueueMode must be set to 1b when TransmitMode is set to
Qav." ("QueueMode 1b" represents the Stream Reservation mode)

The solution is to give queue 0 the all the credits it might need, so
it has priority over queue 1.

A situation where this can happen is when cbs is "installed" only on
queue 1, leaving queue 0 alone. For example:

$ tc qdisc replace dev enp2s0 handle 100: parent root mqprio num_tc 3 \
     	   map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0

$ tc qdisc replace dev enp2s0 parent 100:2 cbs locredit -1470 \
     	   hicredit 30 sendslope -980000 idleslope 20000 offload 1

Signed-off-by: Vinicius Costa Gomes <[email protected]>
Tested-by: Aaron Brown <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>
ice_sched_add_nodes_to_layer is used recursively, and so we start
with num_nodes_added being 0. This way, in case of an error or if
num_nodes is NULL, the function just returns 0 to indicate that no
nodes were added.

Fixes: 5513b92 ("ice: Update Tx scheduler tree for VSI multi-Tx queue support")
Signed-off-by: Anirudh Venkataramanan <[email protected]>
Tested-by: Tony Brelinski <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>
Action type 5 defines large action generic values. Fix comment to
reflect that better.

Signed-off-by: Anirudh Venkataramanan <[email protected]>
Tested-by: Tony Brelinski <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>
According to the hardware spec, checking the INTEVENT bit isn't a
reliable way to detect if an OICR interrupt has occurred. This is
because this bit can be cleared by the hardware/firmware before the
interrupt service routine has run. So instead, just check for OICR
events every time.

Fixes: 940b61a ("ice: Initialize PF and setup miscellaneous interrupt")
Signed-off-by: Ben Shelton <[email protected]>
Signed-off-by: Anirudh Venkataramanan <[email protected]>
Tested-by: Tony Brelinski <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>
Updates to the bitfields in struct packet_sock are not atomic.
Serialize these read-modify-write cycles.

Move po->running into a separate variable. Its writes are protected by
po->bind_lock (except for one startup case at packet_create). Also
replace a textual precondition warning with lockdep annotation.

All others are set only in packet_setsockopt. Serialize these
updates by holding the socket lock. Analogous to other field updates,
also hold the lock when testing whether a ring is active (pg_vec).

Fixes: 8dc4194 ("[PACKET]: Add optional checksum computation for recvmsg")
Reported-by: DaeRyong Jeong <[email protected]>
Reported-by: Byoungyoung Lee <[email protected]>
Signed-off-by: Willem de Bruijn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
While adding support for ethtool::get_fecparam and set_fecparam, kernel
doc for these functions was missed, add those.

Fixes: 1a5f3da ("net: ethtool: add support for forward error correction modes")
Signed-off-by: Florian Fainelli <[email protected]>
Acked-by: Roopa Prabhu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Associate an arbitrary ID with each ARFS filter, allowing to properly query
 for expiry.  The association is maintained in a hash table, which is
 protected by a spinlock.

v3: fix build warnings when CONFIG_RFS_ACCEL is disabled (thanks lkp-robot).
v2: fixed uninitialised variable (thanks davem and lkp-robot).

Fixes: 3af0f34 ("sfc: replace asynchronous filter operations")
Signed-off-by: Edward Cree <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
DMA_DIRECT_OPS is defined in lib/Kconfig, so don't duplicate it in
arch/riscv/Kconfig.

Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>
So don't list it as generic-y.

Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>
Debian toolcahin defaults to PIE, and I guess that will also be the case
of most distributions. This causes the following build failure:

  AS      arch/riscv/kernel/vdso/getcpu.o
  AS      arch/riscv/kernel/vdso/flush_icache.o
  VDSOLD  arch/riscv/kernel/vdso/vdso.so.dbg
  OBJCOPY arch/riscv/kernel/vdso/vdso.so
  AS      arch/riscv/kernel/vdso/vdso.o
  VDSOLD  arch/riscv/kernel/vdso/vdso-dummy.o
  LD      arch/riscv/kernel/vdso/vdso-syms.o
riscv64-linux-gnu-ld: attempted static link of dynamic object `arch/riscv/kernel/vdso/vdso-dummy.o'
make[2]: *** [arch/riscv/kernel/vdso/Makefile:43: arch/riscv/kernel/vdso/vdso-syms.o] Error 1
make[1]: *** [scripts/Makefile.build:575: arch/riscv/kernel/vdso] Error 2
make: *** [Makefile:1018: arch/riscv/kernel] Error 2

While the root Makefile correctly passes "-fno-PIE" to build individual
object files, the RISC-V kernel also builds vdso-dummy.o as an
executable, which is therefore linked as PIE. Fix that by updating this
specific link rule to also include "-no-pie".

Signed-off-by: Aurelien Jarno <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>
For the MAC read operation, the device can return up to two (LAN and WoL)
MAC addresses. Without access to adequate memory, the device will return
an error. Fixed this by allocating the right amount of memory. Also, logic
to detect and copy the LAN MAC address into the port_info structure has
been added. Note that the WoL MAC address is ignored currently as the WoL
feature isn't supported yet.

Fixes: dc49c77 ("ice: Get MAC/PHY/link info and scheduler topology")
Signed-off-by: Md Fahad Iqbal Polash <[email protected]>
Signed-off-by: Anirudh Venkataramanan <[email protected]>
Tested-by: Tony Brelinski <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>
…jkirsher/net-queue

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2018-04-24

This series contains fixes to ixgbevf, igb and ice drivers.

Colin Ian King fixes the return value on error for the new XDP support
that went into ixgbevf for 4.17.

Vinicius provides a fix for queue 0 for igb, which was not receiving all
the credits it needed when QAV mode was enabled.

Anirudh provides several fixes for the new ice driver, starting with
properly initializing num_nodes_added to zero.  Fixed up a code comment
to better reflect what is really going on in the code.  Fixed how to
detect if an OICR interrupt has occurred to a more reliable method.

Md Fahad fixes the ice driver to allocate the right amount of memory
when reading and storing the devices MAC addresses.  The device can have
up to 2 MAC addresses (LAN and WoL), while WoL is currently not
supported, we need to ensure it can be properly handled when support is
added.
====================

Signed-off-by: David S. Miller <[email protected]>
Pull networking fixes from David Miller:

 1) Fix rtnl deadlock in ipvs, from Julian Anastasov.

 2) s390 qeth fixes from Julian Wiedmann (control IO completion stalls,
    bad MAC address update sequence, request side races on command IO
    timeouts).

 3) Handle seq_file overflow properly in l2tp, from Guillaume Nault.

 4) Fix VLAN priority mappings in cpsw driver, from Ivan Khoronzhuk.

 5) Packet scheduler ife action fixes (malformed TLV lengths, etc.) from
    Alexander Aring.

 6) Fix out of bounds access in tcp md5 option parser, from Jann Horn.

 7) Missing netlink attribute policies in rtm_ipv6_policy table, from
    Eric Dumazet.

 8) Missing socket address length checks in l2tp and pppoe connect, from
    Guillaume Nault.

 9) Fix netconsole over team and bonding, from Xin Long.

10) Fix race with AF_PACKET socket state bitfields, from Willem de
    Bruijn.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (51 commits)
  ice: Fix insufficient memory issue in ice_aq_manage_mac_read
  sfc: ARFS filter IDs
  net: ethtool: Add missing kernel doc for FEC parameters
  packet: fix bitfield update race
  ice: Do not check INTEVENT bit for OICR interrupts
  ice: Fix incorrect comment for action type
  ice: Fix initialization for num_nodes_added
  igb: Fix the transmission mode of queue 0 for Qav mode
  ixgbevf: ensure xdp_ring resources are free'd on error exit
  team: fix netconsole setup over team
  amd-xgbe: Only use the SFP supported transceiver signals
  amd-xgbe: Improve KR auto-negotiation and training
  amd-xgbe: Add pre/post auto-negotiation phy hooks
  pppoe: check sockaddr length in pppoe_connect()
  l2tp: check sockaddr length in pppol2tp_connect()
  net: phy: marvell: clear wol event before setting it
  ipv6: add RTA_TABLE and RTA_PREFSRC to rtm_ipv6_policy
  bonding: do not set slave_dev npinfo before slave_enable_netpoll in bond_enslave
  tcp: don't read out-of-bounds opsize
  ibmvnic: Clean actual number of RX or TX pools
  ...
…nel/git/ebiederm/user-namespace

Pull userns bug fix from Eric Biederman:
 "Just a small fix to properly set the return code on error"

* 'userns-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  commoncap: Handle memory allocation failure.
Too much to do with other projects.  I've enjoyed working with everyone
here, and hope to occasionally contribute on bcache.

Signed-off-by: Michael Lyle <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Since Michael had to step back, Coly has agreed to be the new
maintainer. Mark him as such.

Signed-off-by: Jens Axboe <[email protected]>
As it came up in discussion on the mailing list that the semantic
meaning of 'blk_mq_ctx' and 'blk_mq_hw_ctx' isn't completely
obvious to everyone, let's add some minimal kerneldoc for a
starter.

Signed-off-by: Linus Walleij <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
This reverts commit 37c7c6c.

Turns out some drivers(most are FC drivers) may not use managed
IRQ affinity, and has their customized .map_queues meantime, so
still keep this code for avoiding regression.

Reported-by: Laurence Oberman <[email protected]>
Tested-by: Laurence Oberman <[email protected]>
Tested-by: Christian Borntraeger <[email protected]>
Tested-by: Stefan Haberland <[email protected]>
Cc: Ewan Milne <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Sagi Grimberg <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
…ma-mapping

Pull dma-mapping fixes from Christoph Hellwig:
 "A few small dma-mapping fixes for Linux 4.17-rc3:

   - don't loop to try GFP_DMA allocations if ZONE_DMA is not actually
     enabled (regression in 4.16)

   - don't try to do virt_to_page before we know we actuall have a valid
     page in dma_common_mmap

   - a comment fixup related to the above fix"

* tag 'dma-mapping-4.17-3' of git://git.infradead.org/users/hch/dma-mapping:
  dma-mapping: postpone cpu addr translation on mmap
  dma-coherent: clarify dma_mmap_from_dev_coherent documentation
  dma-direct: don't retry allocation for no-op GFP_DMA
…linux/kernel/git/palmer/riscv-linux

Pull RISC-V fixes from Palmer Dabbelt:
 "This contains three small fixes related to the RISC-V port that I'd
  like to target for 4.17-rc3:

   - a Kconfig cleanup to select DMA_DIRECT_OPS instead of redefining it
     in arch/riscv

   - the removal of asm/handle_irq.h, which doesn't exist, from our arch
     header list

   - the addition of "-no-pie" the link rules for our VDSO-related
     files, which fixes the build on systems where PIE is enabled by
     default"

* tag 'riscv-for-linus-4.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux:
  RISC-V: build vdso-dummy.o with -no-pie
  riscv: there is no <asm/handle_irq.h>
  riscv: select DMA_DIRECT_OPS instead of redefining it
Pull block updates from Jens Axboe:
 "I ended up sitting on this about a week longer than I wanted to, since
  we were hashing out details with a timeout change. I've now killed
  that patch, so we can flush the existing queue in due time.

  This contains:

   - Fix for an old regression, where entering the queue can be
     disturbed by a signal to the process. This can cause spurious EIO.
     Fix from Alan Jenkins.

   - cdrom information leak fix from Dan.

   - Trivial helper for testing queue FUA from Dave Chinner, part of his
     O_DIRECT FUA series.

   - Series of swim fixes from Finn that actually makes it work again.

   - Loop O_DIRECT corruption fix, which caused data corruption in
     production for us. From me.

   - BFQ crash fix from me.

   - bcache maintainer update. Michael no longer has the time to do it,
     Coly has stepped up to serve as the new maintainer.

   - blkcg locking fixes from Jiang Biao.

   - Revert of a change from this merge window from Ming, that causes an
     issue on some hardware.

   - Minor clarification doc addition from Linus Walleij"

* tag 'for-linus-20180425' of git://git.kernel.dk/linux-block: (22 commits)
  Revert "blk-mq: remove code for dealing with remapping queue"
  block: mq: Add some minor doc for core structs
  bcache: mark Coly Li as bcache maintainer
  MAINTAINERS: Remove me as maintainer of bcache
  blkcg: init root blkcg_gq under lock
  blkcg: small fix on comment in blkcg_init_queue
  blkcg: don't hold blkcg lock when deactivating policy
  block: add blk_queue_fua() helper function
  cdrom: information leak in cdrom_ioctl_media_changed()
  bfq-iosched: ensure to clear bic/bfqq pointers when preparing request
  blk-mq: start request gstate with gen 1
  block/swim: Select appropriate drive on device open
  block/swim: Fix IO error at end of medium
  block/swim: Check drive type
  block/swim: Rename macros to avoid inconsistent inverted logic
  block/swim: Don't log an error message for an invalid ioctl
  block/swim: Remove extra put_disk() call from error path
  block/swim: Fix array bounds check
  m68k/mac: Don't remap SWIM MMIO region
  loop: handle short DIO reads
  ...
…it/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Eight bug fixes, one spelling update and one tracepoint addition.

  The most serious is probably the mptsas write same fix because it
  means anyone using these controllers sees errors when modern
  filesystems try to issue discards"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: target: fix crash with iscsi target and dvd
  scsi: sd_zbc: Avoid that resetting a zone fails sporadically
  scsi: sd: Defer spinning up drive while SANITIZE is in progress
  scsi: megaraid_sas: Do not log an error if FW successfully initializes.
  scsi: ufs: add trace event for ufs upiu
  scsi: core: remove reference to scsi_show_extd_sense()
  scsi: mptsas: Disable WRITE SAME
  scsi: fnic: fix spelling mistake in fnic stats "Abord" -> "Abort"
  scsi: scsi_debug: IMMED related delay adjustments
  scsi: iscsi: respond to netlink with unicast when appropriate
…l/git/jack/linux-fs

Pull fsnotify fix from Jan Kara:
 "A fix of a fsnotify race causing panics / softlockups"

* tag 'for_v4.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  fsnotify: Fix fsnotify_mark_connector race
@rahrawat rahrawat merged commit a58aabc into rahrawat:master Apr 26, 2018
rahrawat pushed a commit that referenced this pull request Dec 20, 2018
It was observed that a process blocked indefintely in
__fscache_read_or_alloc_page(), waiting for FSCACHE_COOKIE_LOOKING_UP
to be cleared via fscache_wait_for_deferred_lookup().

At this time, ->backing_objects was empty, which would normaly prevent
__fscache_read_or_alloc_page() from getting to the point of waiting.
This implies that ->backing_objects was cleared *after*
__fscache_read_or_alloc_page was was entered.

When an object is "killed" and then "dropped",
FSCACHE_COOKIE_LOOKING_UP is cleared in fscache_lookup_failure(), then
KILL_OBJECT and DROP_OBJECT are "called" and only in DROP_OBJECT is
->backing_objects cleared.  This leaves a window where
something else can set FSCACHE_COOKIE_LOOKING_UP and
__fscache_read_or_alloc_page() can start waiting, before
->backing_objects is cleared

There is some uncertainty in this analysis, but it seems to be fit the
observations.  Adding the wake in this patch will be handled correctly
by __fscache_read_or_alloc_page(), as it checks if ->backing_objects
is empty again, after waiting.

Customer which reported the hang, also report that the hang cannot be
reproduced with this fix.

The backtrace for the blocked process looked like:

PID: 29360  TASK: ffff881ff2ac0f80  CPU: 3   COMMAND: "zsh"
 #0 [ffff881ff43efbf8] schedule at ffffffff815e56f1
 #1 [ffff881ff43efc58] bit_wait at ffffffff815e64ed
 #2 [ffff881ff43efc68] __wait_on_bit at ffffffff815e61b8
 #3 [ffff881ff43efca0] out_of_line_wait_on_bit at ffffffff815e625e
 #4 [ffff881ff43efd08] fscache_wait_for_deferred_lookup at ffffffffa04f2e8f [fscache]
 #5 [ffff881ff43efd18] __fscache_read_or_alloc_page at ffffffffa04f2ffe [fscache]
 #6 [ffff881ff43efd58] __nfs_readpage_from_fscache at ffffffffa0679668 [nfs]
 #7 [ffff881ff43efd78] nfs_readpage at ffffffffa067092b [nfs]
 #8 [ffff881ff43efda0] generic_file_read_iter at ffffffff81187a73
 #9 [ffff881ff43efe50] nfs_file_read at ffffffffa066544b [nfs]
torvalds#10 [ffff881ff43efe70] __vfs_read at ffffffff811fc756
torvalds#11 [ffff881ff43efee8] vfs_read at ffffffff811fccfa
torvalds#12 [ffff881ff43eff18] sys_read at ffffffff811fda62
torvalds#13 [ffff881ff43eff50] entry_SYSCALL_64_fastpath at ffffffff815e986e

Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: David Howells <[email protected]>
rahrawat pushed a commit that referenced this pull request Dec 20, 2018
Function graph tracing recurses into itself when stackleak is enabled,
causing the ftrace graph selftest to run for up to 90 seconds and
trigger the softlockup watchdog.

Breakpoint 2, ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:200
200             mcount_get_lr_addr        x0    //     pointer to function's saved lr
(gdb) bt
\#0  ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:200
\#1  0xffffff80081d5280 in ftrace_caller () at ../arch/arm64/kernel/entry-ftrace.S:153
\#2  0xffffff8008555484 in stackleak_track_stack () at ../kernel/stackleak.c:106
\#3  0xffffff8008421ff8 in ftrace_ops_test (ops=0xffffff8009eaa840 <graph_ops>, ip=18446743524091297036, regs=<optimized out>) at ../kernel/trace/ftrace.c:1507
\#4  0xffffff8008428770 in __ftrace_ops_list_func (regs=<optimized out>, ignored=<optimized out>, parent_ip=<optimized out>, ip=<optimized out>) at ../kernel/trace/ftrace.c:6286
\#5  ftrace_ops_no_ops (ip=18446743524091297036, parent_ip=18446743524091242824) at ../kernel/trace/ftrace.c:6321
\#6  0xffffff80081d5280 in ftrace_caller () at ../arch/arm64/kernel/entry-ftrace.S:153
\#7  0xffffff800832fd10 in irq_find_mapping (domain=0xffffffc03fc4bc80, hwirq=27) at ../kernel/irq/irqdomain.c:876
\#8  0xffffff800832294c in __handle_domain_irq (domain=0xffffffc03fc4bc80, hwirq=27, lookup=true, regs=0xffffff800814b840) at ../kernel/irq/irqdesc.c:650
\#9  0xffffff80081d52b4 in ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:205

Rework so we mark stackleak_track_stack as notrace

Co-developed-by: Arnd Bergmann <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Anders Roxell <[email protected]>
Acked-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
rahrawat pushed a commit that referenced this pull request Dec 20, 2018
The *_frag_reasm() functions are susceptible to miscalculating the byte
count of packet fragments in case the truesize of a head buffer changes.
The truesize member may be changed by the call to skb_unclone(), leaving
the fragment memory limit counter unbalanced even if all fragments are
processed. This miscalculation goes unnoticed as long as the network
namespace which holds the counter is not destroyed.

Should an attempt be made to destroy a network namespace that holds an
unbalanced fragment memory limit counter the cleanup of the namespace
never finishes. The thread handling the cleanup gets stuck in
inet_frags_exit_net() waiting for the percpu counter to reach zero. The
thread is usually in running state with a stacktrace similar to:

 PID: 1073   TASK: ffff880626711440  CPU: 1   COMMAND: "kworker/u48:4"
  #5 [ffff880621563d48] _raw_spin_lock at ffffffff815f5480
  #6 [ffff880621563d48] inet_evict_bucket at ffffffff8158020b
  #7 [ffff880621563d80] inet_frags_exit_net at ffffffff8158051c
  #8 [ffff880621563db0] ops_exit_list at ffffffff814f5856
  #9 [ffff880621563dd8] cleanup_net at ffffffff814f67c0
 torvalds#10 [ffff880621563e38] process_one_work at ffffffff81096f14

It is not possible to create new network namespaces, and processes
that call unshare() end up being stuck in uninterruptible sleep state
waiting to acquire the net_mutex.

The bug was observed in the IPv6 netfilter code by Per Sundstrom.
I thank him for his analysis of the problem. The parts of this patch
that apply to IPv4 and IPv6 fragment reassembly are preemptive measures.

Signed-off-by: Jiri Wiesner <[email protected]>
Reported-by: Per Sundstrom <[email protected]>
Acked-by: Peter Oskolkov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.