Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System pressed for memory during file copy, causing oom-killer to be invoked #5626

Closed
jackietea opened this issue Jan 21, 2017 · 3 comments
Closed
Labels
Component: Memory Management kernel memory management Status: Stale No recent activity for issue

Comments

@jackietea
Copy link

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version Ubuntu 16.04.1 LTS
Linux Kernel 4.4.0-59-generic #80-Ubuntu SMP Fri Jan 6 17:47:47 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Architecture x64
ZFS Version 0.6.5.6-0ubuntu15
SPL Version 0.6.5.6-0ubuntu4

Describe the problem you're observing

To test ZFS, I copy some files back and forth between ZFS and normal storage. Specifically, I use qemu-img to convert vmdk files to raw images, and then copy these raw images into zvol.

This works fine, but tonight I noticed that the OOM killer came to pay a visit. Apparently the system (8GB RAM) is so pressed for memory, that it killed the qemu-img process, and then went ahead and killed the much smaller snapd demon (had the snapd demon been just a bit smaller, it'd have killed the accounts-daemon).

I am aware that ZFS loves RAM, and that people use it with 100+GB. That's not the point of this issue though. There are no VMs running, so those 8 GB are for the restore, and ZFS to use. I'm surprised that ZFS basically runs the system against a wall, and that apparently running anything else on a ZFS machine is subject to getting oom-killed at any time.

Describe how to reproduce the problem

Installed ZFS on Ubuntu 16.04.

Creating the pool:

zpool create -o ashift=12 storage mirror /dev/disk/by-id/... /dev/disk/by-id/...
zpool add -o ashift=12 storage mirror /dev/disk/by-id/... /dev/disk/by-id/...

Adding L2ARC:

zpool add storage log /dev/disk/by-id/...
zpool add storage cache /dev/disk/by-id/...

I don't recall any other specific configuration being done (other than this being on a xen dom0, which however does not have any VMs running). The system does not seem to be pressed for memory, this is during the copy process:

# free -h
              total        used        free      shared  buff/cache   available
Mem:           7.6G        138M        798M        820K        6.7G        6.5G
Swap:          462M         26M        436M

Include any warning/errors/backtraces from the system logs

Jan 21 06:34:34 zfstest kernel: [29359.524576] kthreadd invoked oom-killer: gfp_mask=0x26000c0, order=2, oom_score_adj=0
Jan 21 06:34:34 zfstest kernel: [29359.524580] kthreadd cpuset=/ mems_allowed=0
Jan 21 06:34:34 zfstest kernel: [29359.524587] CPU: 0 PID: 2 Comm: kthreadd Tainted: P           O    4.4.0-59-generic #80-Ubuntu
Jan 21 06:34:34 zfstest kernel: [29359.524589] Hardware name: FUJITSU                          PRIMERGY RX300 S6             /D2619, BIOS 6.00 Rev. 1.09.2619.N1           12/13/2010
Jan 21 06:34:34 zfstest kernel: [29359.524591]  000000000000020f 000000005ec3c638 ffff88023ade3a98 ffffffff813f7583
Jan 21 06:34:34 zfstest kernel: [29359.524594]  ffff88023ade3c70 ffff8800bdc1b300 ffff88023ade3b08 ffffffff8120ad5e
Jan 21 06:34:34 zfstest kernel: [29359.524597]  ffffffff81cd2dc7 0000000000000000 ffffffff81e67760 00000000000002e2
Jan 21 06:34:34 zfstest kernel: [29359.524600] Call Trace:
Jan 21 06:34:34 zfstest kernel: [29359.524608]  [<ffffffff813f7583>] dump_stack+0x63/0x90
Jan 21 06:34:34 zfstest kernel: [29359.524612]  [<ffffffff8120ad5e>] dump_header+0x5a/0x1c5
Jan 21 06:34:34 zfstest kernel: [29359.524618]  [<ffffffff81837f3a>] ? _raw_spin_unlock_irqrestore+0x1a/0x20
Jan 21 06:34:34 zfstest kernel: [29359.524622]  [<ffffffff81192722>] oom_kill_process+0x202/0x3c0
Jan 21 06:34:34 zfstest kernel: [29359.524624]  [<ffffffff81192b49>] out_of_memory+0x219/0x460
Jan 21 06:34:34 zfstest kernel: [29359.524628]  [<ffffffff81198abd>] __alloc_pages_slowpath.constprop.88+0x8fd/0xa70
Jan 21 06:34:34 zfstest kernel: [29359.524632]  [<ffffffff81198eb6>] __alloc_pages_nodemask+0x286/0x2a0
Jan 21 06:34:34 zfstest kernel: [29359.524636]  [<ffffffff81198f6b>] alloc_kmem_pages_node+0x4b/0xc0
Jan 21 06:34:34 zfstest kernel: [29359.524640]  [<ffffffff8107ea5e>] copy_process+0x1be/0x1b70
Jan 21 06:34:34 zfstest kernel: [29359.524644]  [<ffffffff810b913a>] ? enqueue_task_fair+0xaa/0x8b0
Jan 21 06:34:34 zfstest kernel: [29359.524647]  [<ffffffff8101da2e>] ? xen_mc_flush+0x17e/0x1b0
Jan 21 06:34:34 zfstest kernel: [29359.524649]  [<ffffffff8101d214>] ? xen_load_sp0+0x84/0x180
Jan 21 06:34:34 zfstest kernel: [29359.524652]  [<ffffffff810a0b30>] ? kthread_create_on_node+0x1e0/0x1e0
Jan 21 06:34:34 zfstest kernel: [29359.524655]  [<ffffffff810805a0>] _do_fork+0x80/0x360
Jan 21 06:34:34 zfstest kernel: [29359.524658]  [<ffffffff81833d46>] ? __schedule+0x3b6/0xa30
Jan 21 06:34:34 zfstest kernel: [29359.524660]  [<ffffffff810808a9>] kernel_thread+0x29/0x30
Jan 21 06:34:34 zfstest kernel: [29359.524662]  [<ffffffff810a1528>] kthreadd+0x158/0x1a0
Jan 21 06:34:34 zfstest kernel: [29359.524665]  [<ffffffff810a13d0>] ? kthread_create_on_cpu+0x60/0x60
Jan 21 06:34:34 zfstest kernel: [29359.524668]  [<ffffffff8183888f>] ret_from_fork+0x3f/0x70
Jan 21 06:34:34 zfstest kernel: [29359.524671]  [<ffffffff810a13d0>] ? kthread_create_on_cpu+0x60/0x60
Jan 21 06:34:34 zfstest kernel: [29359.524676] Mem-Info:
Jan 21 06:34:34 zfstest kernel: [29359.524681] active_anon:1708 inactive_anon:2769 isolated_anon:0
Jan 21 06:34:34 zfstest kernel: [29359.524681]  active_file:130048 inactive_file:1475779 isolated_file:0
Jan 21 06:34:34 zfstest kernel: [29359.524681]  unevictable:918 dirty:277001 writeback:7653 unstable:0
Jan 21 06:34:34 zfstest kernel: [29359.524681]  slab_reclaimable:73832 slab_unreclaimable:154465
Jan 21 06:34:34 zfstest kernel: [29359.524681]  mapped:5312 shmem:202 pagetables:777 bounce:0
Jan 21 06:34:34 zfstest kernel: [29359.524681]  free:15120 free_pcp:0 free_cma:0
Jan 21 06:34:34 zfstest kernel: [29359.524685] Node 0 DMA free:15868kB min:20kB low:24kB high:28kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15980kB managed:15896kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:8kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Jan 21 06:34:34 zfstest kernel: [29359.524692] lowmem_reserve[]: 0 2865 7765 7765 7765
Jan 21 06:34:34 zfstest kernel: [29359.524696] Node 0 DMA32 free:28624kB min:4156kB low:5192kB high:6232kB active_anon:2240kB inactive_anon:4120kB active_file:199388kB inactive_file:2238220kB unevictable:2208kB isolated(anon):0kB isolated(file):0kB present:3119744kB managed:2973504kB mlocked:2208kB dirty:422540kB writeback:13692kB mapped:10400kB shmem:172kB slab_reclaimable:122284kB slab_unreclaimable:138388kB kernel_stack:3840kB pagetables:908kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:64 all_unreclaimable? no
Jan 21 06:34:34 zfstest kernel: [29359.524703] lowmem_reserve[]: 0 0 4899 4899 4899
Jan 21 06:34:34 zfstest kernel: [29359.524707] Node 0 Normal free:15988kB min:7104kB low:8880kB high:10656kB active_anon:4592kB inactive_anon:6956kB active_file:320804kB inactive_file:3664896kB unevictable:1464kB isolated(anon):0kB isolated(file):0kB present:5252880kB managed:5017212kB mlocked:1464kB dirty:685464kB writeback:16920kB mapped:10848kB shmem:636kB slab_reclaimable:173044kB slab_unreclaimable:479464kB kernel_stack:5440kB pagetables:2200kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Jan 21 06:34:34 zfstest kernel: [29359.524712] lowmem_reserve[]: 0 0 0 0 0
Jan 21 06:34:34 zfstest kernel: [29359.524716] Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15868kB
Jan 21 06:34:34 zfstest kernel: [29359.524729] Node 0 DMA32: 5466*4kB (UE) 855*8kB (UME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 28704kB
Jan 21 06:34:34 zfstest kernel: [29359.524737] Node 0 Normal: 2419*4kB (UMEH) 555*8kB (UMEH) 1*16kB (H) 3*32kB (H) 3*64kB (H) 3*128kB (H) 1*256kB (H) 2*512kB (H) 0*1024kB 0*2048kB 0*4096kB = 16084kB
Jan 21 06:34:34 zfstest kernel: [29359.524746] 1606948 total pagecache pages
Jan 21 06:34:34 zfstest kernel: [29359.524748] 310 pages in swap cache
Jan 21 06:34:34 zfstest kernel: [29359.524749] Swap cache stats: add 9568, delete 9258, find 2337/4179
Jan 21 06:34:34 zfstest kernel: [29359.524750] Free swap  = 453576kB
Jan 21 06:34:34 zfstest kernel: [29359.524751] Total swap = 474108kB
Jan 21 06:34:34 zfstest kernel: [29359.524752] 2097151 pages RAM
Jan 21 06:34:34 zfstest kernel: [29359.524753] 0 pages HighMem/MovableOnly
Jan 21 06:34:34 zfstest kernel: [29359.524754] 95498 pages reserved
Jan 21 06:34:34 zfstest kernel: [29359.524754] 0 pages cma reserved
Jan 21 06:34:34 zfstest kernel: [29359.524755] 0 pages hwpoisoned
Jan 21 06:34:34 zfstest kernel: [29359.524756] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
Jan 21 06:34:34 zfstest kernel: [29359.524761] [  981]     0   981     8845      680      18       3       58             0 systemd-journal
Jan 21 06:34:34 zfstest kernel: [29359.524763] [ 1012]     0  1012    25742      205      17       3       16             0 lvmetad
Jan 21 06:34:34 zfstest kernel: [29359.524766] [ 1041]     0  1041    11222      477      23       3      331         -1000 systemd-udevd
Jan 21 06:34:34 zfstest kernel: [29359.524770] [ 3043]   100  3043    25081      434      19       3       51             0 systemd-timesyn
Jan 21 06:34:34 zfstest kernel: [29359.524772] [ 3242]   104  3242    65158      384      28       3      154             0 rsyslogd
Jan 21 06:34:34 zfstest kernel: [29359.524773] [ 3246]     0  3246     7137      361      17       3       43             0 systemd-logind
Jan 21 06:34:34 zfstest kernel: [29359.524775] [ 3248]   106  3248    10725      269      27       3       15          -900 dbus-daemon
Jan 21 06:34:34 zfstest kernel: [29359.524777] [ 3483]     0  3483     1100      318       8       3       26             0 acpid
Jan 21 06:34:34 zfstest kernel: [29359.524779] [ 3485]     0  3485    68216      393      35       4      156             0 accounts-daemon
Jan 21 06:34:34 zfstest kernel: [29359.524781] [ 3498]     0  3498     6517      463      18       3       48             0 cron
Jan 21 06:34:34 zfstest kernel: [29359.524783] [ 3500]     0  3500     7914      400      20       3       51             0 zed
Jan 21 06:34:34 zfstest kernel: [29359.524784] [ 3502]     0  3502     6511      336      19       3       40             0 atd
Jan 21 06:34:34 zfstest kernel: [29359.524786] [ 3510]     0  3510    40225      168      15       3        2             0 lxcfs
Jan 21 06:34:34 zfstest kernel: [29359.524788] [ 3515]     0  3515    69604      875      28       5      119             0 snapd
Jan 21 06:34:34 zfstest kernel: [29359.524790] [ 3593]     0  3593     3437      103      11       3       17             0 mdadm
Jan 21 06:34:34 zfstest kernel: [29359.524791] [ 3656]     0  3656    69272      197      38       3       64             0 polkitd
Jan 21 06:34:34 zfstest kernel: [29359.524793] [ 4019]     0  4019    16380      305      36       3      170         -1000 sshd
Jan 21 06:34:34 zfstest kernel: [29359.524795] [ 4056]     0  4056     1306       30       8       3        0             0 iscsid
Jan 21 06:34:34 zfstest kernel: [29359.524797] [ 4057]     0  4057     1431      881       8       3        0           -17 iscsid
Jan 21 06:34:34 zfstest kernel: [29359.524799] [ 4226]     0  4226     3211      439      12       3        0             0 agetty
Jan 21 06:34:34 zfstest kernel: [29359.524800] [ 4236]     0  4236     4766       10      15       3       28             0 xenwatchdogd
Jan 21 06:34:34 zfstest kernel: [29359.524802] [ 4237]     0  4237     3985      348      13       3        0             0 agetty
Jan 21 06:34:34 zfstest kernel: [29359.524804] [ 4249]   112  4249     5992      517      17       3        6             0 nrpe
Jan 21 06:34:34 zfstest kernel: [29359.524806] [ 4256]     0  4256     4891      381      13       3       46             0 irqbalance
Jan 21 06:34:34 zfstest kernel: [29359.524808] [ 4293]     0  4293     4807      397      15       3      103             0 xenstored
Jan 21 06:34:34 zfstest kernel: [29359.524809] [ 4299]     0  4299    22722        0      18       3       53             0 xenconsoled
Jan 21 06:34:34 zfstest kernel: [29359.524811] [ 4304]     0  4304    35206      221      37       3      401             0 qemu-system-i38
Jan 21 06:34:34 zfstest kernel: [29359.524813] [ 4808]     0  4808     9266      646      23       3      151             0 systemd
Jan 21 06:34:34 zfstest kernel: [29359.524815] [ 4812]     0  4812    15446        7      32       3      627             0 (sd-pam)
Jan 21 06:34:34 zfstest kernel: [29359.524824] [ 7193]     0  7193     6134      727      17       3        0             0 screen
Jan 21 06:34:34 zfstest kernel: [29359.524826] [ 7194]     0  7194     4621      592      14       3        0             0 bash
Jan 21 06:34:34 zfstest kernel: [29359.524828] [ 7225]     0  7225     2407      656      10       3        0             0 test_backups.zf
Jan 21 06:34:34 zfstest kernel: [29359.524830] [ 7229]     0  7229     2436      447      10       3        0             0 mount.ntfs
Jan 21 06:34:34 zfstest kernel: [29359.524832] [ 8150]     0  8150   134352     4269     116       4        0             0 qemu-img
Jan 21 06:34:34 zfstest kernel: [29359.524834] Out of memory: Kill process 8150 (qemu-img) score 2 or sacrifice child
Jan 21 06:34:34 zfstest kernel: [29359.524903] Killed process 8150 (qemu-img) total-vm:537408kB, anon-rss:4072kB, file-rss:13004kB
Jan 21 06:34:34 zfstest kernel: [29359.531024] test_backups.zf invoked oom-killer: gfp_mask=0x26000c0, order=2, oom_score_adj=0
Jan 21 06:34:34 zfstest kernel: [29359.531028] test_backups.zf cpuset=/ mems_allowed=0
Jan 21 06:34:34 zfstest kernel: [29359.531034] CPU: 0 PID: 8154 Comm: test_backups.zf Tainted: P           O    4.4.0-59-generic #80-Ubuntu
Jan 21 06:34:34 zfstest kernel: [29359.531036] Hardware name: FUJITSU                          PRIMERGY RX300 S6             /D2619, BIOS 6.00 Rev. 1.09.2619.N1           12/13/2010
Jan 21 06:34:34 zfstest kernel: [29359.531039]  000000000000020f 00000000f2c414b8 ffff880100027af0 ffffffff813f7583
Jan 21 06:34:34 zfstest kernel: [29359.531042]  ffff880100027cc8 ffff880237291980 ffff880100027b60 ffffffff8120ad5e
Jan 21 06:34:34 zfstest kernel: [29359.531046]  ffffffff81cd2dc7 0000000000000000 ffffffff81e67760 00000000000002e2
Jan 21 06:34:34 zfstest kernel: [29359.531049] Call Trace:
Jan 21 06:34:34 zfstest kernel: [29359.531056]  [<ffffffff813f7583>] dump_stack+0x63/0x90
Jan 21 06:34:34 zfstest kernel: [29359.531060]  [<ffffffff8120ad5e>] dump_header+0x5a/0x1c5
Jan 21 06:34:34 zfstest kernel: [29359.531066]  [<ffffffff81837f3a>] ? _raw_spin_unlock_irqrestore+0x1a/0x20
Jan 21 06:34:34 zfstest kernel: [29359.531069]  [<ffffffff81192722>] oom_kill_process+0x202/0x3c0
Jan 21 06:34:34 zfstest kernel: [29359.531072]  [<ffffffff81192b49>] out_of_memory+0x219/0x460
Jan 21 06:34:34 zfstest kernel: [29359.531076]  [<ffffffff81198abd>] __alloc_pages_slowpath.constprop.88+0x8fd/0xa70
Jan 21 06:34:34 zfstest kernel: [29359.531080]  [<ffffffff81198eb6>] __alloc_pages_nodemask+0x286/0x2a0
Jan 21 06:34:34 zfstest kernel: [29359.531083]  [<ffffffff81198f6b>] alloc_kmem_pages_node+0x4b/0xc0
Jan 21 06:34:34 zfstest kernel: [29359.531087]  [<ffffffff8107ea5e>] copy_process+0x1be/0x1b70
Jan 21 06:34:34 zfstest kernel: [29359.531091]  [<ffffffff811c1670>] ? handle_mm_fault+0xce0/0x1820
Jan 21 06:34:34 zfstest kernel: [29359.531095]  [<ffffffff81347bd3>] ? security_file_alloc+0x33/0x50
Jan 21 06:34:34 zfstest kernel: [29359.531098]  [<ffffffff810805a0>] _do_fork+0x80/0x360
Jan 21 06:34:34 zfstest kernel: [29359.531101]  [<ffffffff8109114f>] ? sigprocmask+0x6f/0xa0
Jan 21 06:34:34 zfstest kernel: [29359.531104]  [<ffffffff81080929>] SyS_clone+0x19/0x20
Jan 21 06:34:34 zfstest kernel: [29359.531107]  [<ffffffff818384f2>] entry_SYSCALL_64_fastpath+0x16/0x71
Jan 21 06:34:34 zfstest kernel: [29359.531109] Mem-Info:
Jan 21 06:34:34 zfstest kernel: [29359.531114] active_anon:936 inactive_anon:2503 isolated_anon:0
Jan 21 06:34:34 zfstest kernel: [29359.531114]  active_file:130048 inactive_file:1474918 isolated_file:0
Jan 21 06:34:34 zfstest kernel: [29359.531114]  unevictable:918 dirty:277001 writeback:7653 unstable:0
Jan 21 06:34:34 zfstest kernel: [29359.531114]  slab_reclaimable:73832 slab_unreclaimable:154465
Jan 21 06:34:34 zfstest kernel: [29359.531114]  mapped:2586 shmem:202 pagetables:654 bounce:0
Jan 21 06:34:34 zfstest kernel: [29359.531114]  free:17138 free_pcp:186 free_cma:0
Jan 21 06:34:34 zfstest kernel: [29359.531118] Node 0 DMA free:15868kB min:20kB low:24kB high:28kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15980kB managed:15896kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:8kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Jan 21 06:34:34 zfstest kernel: [29359.531124] lowmem_reserve[]: 0 2865 7765 7765 7765
Jan 21 06:34:34 zfstest kernel: [29359.531128] Node 0 DMA32 free:32124kB min:4156kB low:5192kB high:6232kB active_anon:1240kB inactive_anon:3520kB active_file:199388kB inactive_file:2236560kB unevictable:2208kB isolated(anon):0kB isolated(file):0kB present:3119744kB managed:2973504kB mlocked:2208kB dirty:422540kB writeback:13692kB mapped:5516kB shmem:172kB slab_reclaimable:122284kB slab_unreclaimable:138388kB kernel_stack:3840kB pagetables:760kB unstable:0kB bounce:0kB free_pcp:156kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:28 all_unreclaimable? no
Jan 21 06:34:34 zfstest kernel: [29359.531135] lowmem_reserve[]: 0 0 4899 4899 4899
Jan 21 06:34:34 zfstest kernel: [29359.531138] Node 0 Normal free:20792kB min:7104kB low:8880kB high:10656kB active_anon:2504kB inactive_anon:6492kB active_file:320804kB inactive_file:3663112kB unevictable:1464kB isolated(anon):0kB isolated(file):0kB present:5252880kB managed:5017212kB mlocked:1464kB dirty:685464kB writeback:16920kB mapped:4828kB shmem:636kB slab_reclaimable:173044kB slab_unreclaimable:479464kB kernel_stack:5440kB pagetables:1856kB unstable:0kB bounce:0kB free_pcp:656kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Jan 21 06:34:34 zfstest kernel: [29359.531145] lowmem_reserve[]: 0 0 0 0 0
Jan 21 06:34:34 zfstest kernel: [29359.531149] Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15868kB
Jan 21 06:34:34 zfstest kernel: [29359.531163] Node 0 DMA32: 5902*4kB (UME) 1071*8kB (UME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 32176kB
Jan 21 06:34:34 zfstest kernel: [29359.531172] Node 0 Normal: 3080*4kB (UMEH) 827*8kB (UMEH) 1*16kB (H) 3*32kB (H) 3*64kB (H) 3*128kB (H) 1*256kB (H) 2*512kB (H) 0*1024kB 0*2048kB 0*4096kB = 20904kB
Jan 21 06:34:34 zfstest kernel: [29359.531184] 1606111 total pagecache pages
Jan 21 06:34:34 zfstest kernel: [29359.531186] 316 pages in swap cache
Jan 21 06:34:34 zfstest kernel: [29359.531188] Swap cache stats: add 9578, delete 9262, find 2341/4187
Jan 21 06:34:34 zfstest kernel: [29359.531189] Free swap  = 453592kB
Jan 21 06:34:34 zfstest kernel: [29359.531191] Total swap = 474108kB
Jan 21 06:34:34 zfstest kernel: [29359.531192] 2097151 pages RAM
Jan 21 06:34:34 zfstest kernel: [29359.531193] 0 pages HighMem/MovableOnly
Jan 21 06:34:34 zfstest kernel: [29359.531194] 95498 pages reserved
Jan 21 06:34:34 zfstest kernel: [29359.531195] 0 pages cma reserved
Jan 21 06:34:34 zfstest kernel: [29359.531196] 0 pages hwpoisoned
Jan 21 06:34:34 zfstest kernel: [29359.531198] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
Jan 21 06:34:34 zfstest kernel: [29359.531204] [  981]     0   981     8845      680      18       3       58             0 systemd-journal
Jan 21 06:34:34 zfstest kernel: [29359.531207] [ 1012]     0  1012    25742      205      17       3       16             0 lvmetad
Jan 21 06:34:34 zfstest kernel: [29359.531210] [ 1041]     0  1041    11222      477      23       3      331         -1000 systemd-udevd
Jan 21 06:34:34 zfstest kernel: [29359.531215] [ 3043]   100  3043    25081      434      19       3       51             0 systemd-timesyn
Jan 21 06:34:34 zfstest kernel: [29359.531218] [ 3242]   104  3242    65158      384      28       3      154             0 rsyslogd
Jan 21 06:34:34 zfstest kernel: [29359.531221] [ 3246]     0  3246     7137      361      17       3       43             0 systemd-logind
Jan 21 06:34:34 zfstest kernel: [29359.531223] [ 3248]   106  3248    10725      269      27       3       15          -900 dbus-daemon
Jan 21 06:34:34 zfstest kernel: [29359.531226] [ 3483]     0  3483     1100      318       8       3       26             0 acpid
Jan 21 06:34:34 zfstest kernel: [29359.531228] [ 3485]     0  3485    68216      393      35       4      156             0 accounts-daemon
Jan 21 06:34:34 zfstest kernel: [29359.531231] [ 3498]     0  3498     6517      463      18       3       48             0 cron
Jan 21 06:34:34 zfstest kernel: [29359.531233] [ 3500]     0  3500     7914      400      20       3       51             0 zed
Jan 21 06:34:34 zfstest kernel: [29359.531236] [ 3502]     0  3502     6511      336      19       3       40             0 atd
Jan 21 06:34:34 zfstest kernel: [29359.531238] [ 3510]     0  3510    40225      168      15       3        2             0 lxcfs
Jan 21 06:34:34 zfstest kernel: [29359.531241] [ 3515]     0  3515    69604      875      28       5      119             0 snapd
Jan 21 06:34:34 zfstest kernel: [29359.531243] [ 3593]     0  3593     3437      103      11       3       17             0 mdadm
Jan 21 06:34:34 zfstest kernel: [29359.531245] [ 3656]     0  3656    69272      197      38       3       64             0 polkitd
Jan 21 06:34:34 zfstest kernel: [29359.531248] [ 4019]     0  4019    16380      305      36       3      170         -1000 sshd
Jan 21 06:34:34 zfstest kernel: [29359.531250] [ 4056]     0  4056     1306       30       8       3        0             0 iscsid
Jan 21 06:34:34 zfstest kernel: [29359.531253] [ 4057]     0  4057     1431      881       8       3        0           -17 iscsid
Jan 21 06:34:34 zfstest kernel: [29359.531255] [ 4226]     0  4226     3211      439      12       3        0             0 agetty
Jan 21 06:34:34 zfstest kernel: [29359.531258] [ 4236]     0  4236     4766       10      15       3       28             0 xenwatchdogd
Jan 21 06:34:34 zfstest kernel: [29359.531260] [ 4237]     0  4237     3985      348      13       3        0             0 agetty
Jan 21 06:34:34 zfstest kernel: [29359.531262] [ 4249]   112  4249     5992      517      17       3        6             0 nrpe
Jan 21 06:34:34 zfstest kernel: [29359.531265] [ 4256]     0  4256     4891      381      13       3       46             0 irqbalance
Jan 21 06:34:34 zfstest kernel: [29359.531268] [ 4293]     0  4293     4807      397      15       3      103             0 xenstored
Jan 21 06:34:34 zfstest kernel: [29359.531270] [ 4299]     0  4299    22722        0      18       3       53             0 xenconsoled
Jan 21 06:34:34 zfstest kernel: [29359.531273] [ 4304]     0  4304    35206      221      37       3      401             0 qemu-system-i38
Jan 21 06:34:34 zfstest kernel: [29359.531276] [ 4808]     0  4808     9266      646      23       3      151             0 systemd
Jan 21 06:34:34 zfstest kernel: [29359.531278] [ 4812]     0  4812    15446        7      32       3      627             0 (sd-pam)
Jan 21 06:34:34 zfstest kernel: [29359.531290] [ 7193]     0  7193     6134      727      17       3        0             0 screen
Jan 21 06:34:34 zfstest kernel: [29359.531292] [ 7194]     0  7194     4621      592      14       3        0             0 bash
Jan 21 06:34:34 zfstest kernel: [29359.531295] [ 7225]     0  7225     2407      656      10       3        0             0 test_backups.zf
Jan 21 06:34:34 zfstest kernel: [29359.531298] [ 7229]     0  7229     2436      447      10       3        0             0 mount.ntfs
Jan 21 06:34:34 zfstest kernel: [29359.531301] [ 8154]     0  8154     2407       64      10       3        0             0 test_backups.zf
Jan 21 06:34:34 zfstest kernel: [29359.531303] [ 8155]     0  8155     2407       66       7       3        0             0 test_backups.zf
Jan 21 06:34:34 zfstest kernel: [29359.531305] Out of memory: Kill process 3515 (snapd) score 0 or sacrifice child
Jan 21 06:34:34 zfstest kernel: [29359.531407] Killed process 3515 (snapd) total-vm:278416kB, anon-rss:3500kB, file-rss:0kB
Jan 21 06:34:35 zfstest systemd[1]: snapd.service: Main process exited, code=killed, status=9/KILL

This only happens once, and the remaining vmdk files were copied properly (without the oom-killer). Repeating the process also caused the vmdk in question to be copied properly.

This is preceeded by a few hung tasks log messages, which happened hours before that, but didn't seem too related, hence just for completeness sake:

Jan 21 04:27:19 zfstest kernel: [21724.708099] INFO: task txg_sync:7013 blocked for more than 120 seconds.
Jan 21 04:27:19 zfstest kernel: [21724.708128]       Tainted: P           O    4.4.0-59-generic #80-Ubuntu
Jan 21 04:27:19 zfstest kernel: [21724.708151] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 21 04:27:19 zfstest kernel: [21724.708181] txg_sync        D ffff8800be3cfaa8     0  7013      2 0x00000000
Jan 21 04:27:19 zfstest kernel: [21724.708184]  ffff8800be3cfaa8 ffff8800bdc1a04c ffff88023ae10000 ffff8800bd8d3300
Jan 21 04:27:19 zfstest kernel: [21724.708187]  ffff8800be3d0000 ffff880240256dc0 7fffffffffffffff ffff8800a3edd4e8
Jan 21 04:27:19 zfstest kernel: [21724.708189]  0000000000000001 ffff8800be3cfac0 ffffffff818343f5 0000000000000000
Jan 21 04:27:19 zfstest kernel: [21724.708191] Call Trace:
Jan 21 04:27:19 zfstest kernel: [21724.708199]  [<ffffffff818343f5>] schedule+0x35/0x80
Jan 21 04:27:19 zfstest kernel: [21724.708201]  [<ffffffff81837515>] schedule_timeout+0x1b5/0x270
Jan 21 04:27:19 zfstest kernel: [21724.708204]  [<ffffffff8101d214>] ? xen_load_sp0+0x84/0x180
Jan 21 04:27:19 zfstest kernel: [21724.708207]  [<ffffffff810caf11>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
Jan 21 04:27:19 zfstest kernel: [21724.708209]  [<ffffffff810caf11>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
Jan 21 04:27:19 zfstest kernel: [21724.708211]  [<ffffffff81833924>] io_schedule_timeout+0xa4/0x110
Jan 21 04:27:19 zfstest kernel: [21724.708221]  [<ffffffffc049dbec>] cv_wait_common+0xbc/0x140 [spl]
Jan 21 04:27:19 zfstest kernel: [21724.708224]  [<ffffffff810c4240>] ? wake_atomic_t_function+0x60/0x60
Jan 21 04:27:19 zfstest kernel: [21724.708229]  [<ffffffffc049dcc8>] __cv_wait_io+0x18/0x20 [spl]
Jan 21 04:27:19 zfstest kernel: [21724.708271]  [<ffffffffc06012fe>] zio_wait+0x10e/0x1f0 [zfs]
Jan 21 04:27:19 zfstest kernel: [21724.708297]  [<ffffffffc058ac58>] dsl_pool_sync+0xb8/0x430 [zfs]
Jan 21 04:27:19 zfstest kernel: [21724.708326]  [<ffffffffc05a65b6>] spa_sync+0x366/0xb30 [zfs]
Jan 21 04:27:19 zfstest kernel: [21724.708330]  [<ffffffff810ac5d2>] ? default_wake_function+0x12/0x20
Jan 21 04:27:19 zfstest kernel: [21724.708332]  [<ffffffff810caf11>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
Jan 21 04:27:19 zfstest kernel: [21724.708362]  [<ffffffffc05b7a4a>] txg_sync_thread+0x3ba/0x630 [zfs]
Jan 21 04:27:19 zfstest kernel: [21724.708365]  [<ffffffff810caf11>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
Jan 21 04:27:19 zfstest kernel: [21724.708395]  [<ffffffffc05b7690>] ? txg_delay+0x180/0x180 [zfs]
Jan 21 04:27:19 zfstest kernel: [21724.708400]  [<ffffffffc0498e31>] thread_generic_wrapper+0x71/0x80 [spl]
Jan 21 04:27:19 zfstest kernel: [21724.708404]  [<ffffffffc0498dc0>] ? __thread_exit+0x20/0x20 [spl]
Jan 21 04:27:19 zfstest kernel: [21724.708407]  [<ffffffff810a0c08>] kthread+0xd8/0xf0
Jan 21 04:27:19 zfstest kernel: [21724.708409]  [<ffffffff810a0b30>] ? kthread_create_on_node+0x1e0/0x1e0
Jan 21 04:27:19 zfstest kernel: [21724.708411]  [<ffffffff8183888f>] ret_from_fork+0x3f/0x70
Jan 21 04:27:19 zfstest kernel: [21724.708413]  [<ffffffff810a0b30>] ? kthread_create_on_node+0x1e0/0x1e0
Jan 21 04:27:19 zfstest kernel: [21724.708416] INFO: task kworker/u4:2:7883 blocked for more than 120 seconds.
Jan 21 04:27:19 zfstest kernel: [21724.708441]       Tainted: P           O    4.4.0-59-generic #80-Ubuntu
Jan 21 04:27:19 zfstest kernel: [21724.708465] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 21 04:27:19 zfstest kernel: [21724.708494] kworker/u4:2    D ffff8800a050b5b8     0  7883      2 0x00000000
Jan 21 04:27:19 zfstest kernel: [21724.708500] Workqueue: writeback wb_workfn (flush-230:32)
Jan 21 04:27:19 zfstest kernel: [21724.708502]  ffff8800a050b5b8 ffff8800bd892370 ffff8800bd8d2640 ffff8800bdc1b300
Jan 21 04:27:19 zfstest kernel: [21724.708504]  ffff8800a050c000 ffff8800bd892220 ffff8800bd892248 ffff8800bd892370
Jan 21 04:27:19 zfstest kernel: [21724.708506]  0000000000000000 ffff8800a050b5d0 ffffffff818343f5 ffff8800bd892368
Jan 21 04:27:19 zfstest kernel: [21724.708508] Call Trace:
Jan 21 04:27:19 zfstest kernel: [21724.708510]  [<ffffffff818343f5>] schedule+0x35/0x80
Jan 21 04:27:19 zfstest kernel: [21724.708515]  [<ffffffffc049dc3b>] cv_wait_common+0x10b/0x140 [spl]
Jan 21 04:27:19 zfstest kernel: [21724.708518]  [<ffffffff810c4240>] ? wake_atomic_t_function+0x60/0x60
Jan 21 04:27:19 zfstest kernel: [21724.708523]  [<ffffffffc049dc85>] __cv_wait+0x15/0x20 [spl]
Jan 21 04:27:19 zfstest kernel: [21724.708554]  [<ffffffffc05b74a8>] txg_wait_open+0xb8/0x100 [zfs]
Jan 21 04:27:19 zfstest kernel: [21724.708577]  [<ffffffffc05720ca>] dmu_tx_wait+0x38a/0x3a0 [zfs]
Jan 21 04:27:19 zfstest kernel: [21724.708600]  [<ffffffffc0572172>] dmu_tx_assign+0x92/0x4e0 [zfs]
Jan 21 04:27:19 zfstest kernel: [21724.708633]  [<ffffffffc060c77b>] zvol_request+0x1fb/0x670 [zfs]
Jan 21 04:27:19 zfstest kernel: [21724.708636]  [<ffffffff813c4009>] ? generic_make_request_checks+0x229/0x530
Jan 21 04:27:19 zfstest kernel: [21724.708638]  [<ffffffff813c7a82>] generic_make_request+0xf2/0x1d0
Jan 21 04:27:19 zfstest kernel: [21724.708640]  [<ffffffff813c7bd6>] submit_bio+0x76/0x170
Jan 21 04:27:19 zfstest kernel: [21724.708643]  [<ffffffff8124723f>] submit_bh_wbc+0x12f/0x160
Jan 21 04:27:19 zfstest kernel: [21724.708645]  [<ffffffff81249215>] __block_write_full_page.constprop.38+0x115/0x3a0
Jan 21 04:27:19 zfstest kernel: [21724.708647]  [<ffffffff81249800>] ? I_BDEV+0x20/0x20
Jan 21 04:27:19 zfstest kernel: [21724.708649]  [<ffffffff81249800>] ? I_BDEV+0x20/0x20
Jan 21 04:27:19 zfstest kernel: [21724.708650]  [<ffffffff812495e5>] block_write_full_page+0x145/0x170
Jan 21 04:27:19 zfstest kernel: [21724.708652]  [<ffffffff8124a3d8>] blkdev_writepage+0x18/0x20
Jan 21 04:27:19 zfstest kernel: [21724.708655]  [<ffffffff8119a0b3>] __writepage+0x13/0x30
Jan 21 04:27:19 zfstest kernel: [21724.708657]  [<ffffffff8119abae>] write_cache_pages+0x1ee/0x510
Jan 21 04:27:19 zfstest kernel: [21724.708659]  [<ffffffff8119a0a0>] ? wb_position_ratio+0x1f0/0x1f0
Jan 21 04:27:19 zfstest kernel: [21724.708662]  [<ffffffff8119af21>] generic_writepages+0x51/0x80
Jan 21 04:27:19 zfstest kernel: [21724.708664]  [<ffffffff81191777>] ? mempool_free_slab+0x17/0x20
Jan 21 04:27:19 zfstest kernel: [21724.708666]  [<ffffffff810caf11>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
Jan 21 04:27:19 zfstest kernel: [21724.708669]  [<ffffffff8119d39e>] do_writepages+0x1e/0x30
Jan 21 04:27:19 zfstest kernel: [21724.708671]  [<ffffffff8123d7b5>] __writeback_single_inode+0x45/0x340
Jan 21 04:27:19 zfstest kernel: [21724.708673]  [<ffffffff8123dfc2>] writeback_sb_inodes+0x262/0x600
Jan 21 04:27:19 zfstest kernel: [21724.708675]  [<ffffffff8123e3ec>] __writeback_inodes_wb+0x8c/0xc0
Jan 21 04:27:19 zfstest kernel: [21724.708677]  [<ffffffff8123e743>] wb_writeback+0x253/0x310
Jan 21 04:27:19 zfstest kernel: [21724.708679]  [<ffffffff8123ef1d>] wb_workfn+0x24d/0x400
Jan 21 04:27:19 zfstest kernel: [21724.708682]  [<ffffffff8109a575>] process_one_work+0x165/0x480
Jan 21 04:27:19 zfstest kernel: [21724.708684]  [<ffffffff8109a8db>] worker_thread+0x4b/0x4c0
Jan 21 04:27:19 zfstest kernel: [21724.708686]  [<ffffffff8109a890>] ? process_one_work+0x480/0x480
Jan 21 04:27:19 zfstest kernel: [21724.708688]  [<ffffffff8109a890>] ? process_one_work+0x480/0x480
Jan 21 04:27:19 zfstest kernel: [21724.708689]  [<ffffffff810a0c08>] kthread+0xd8/0xf0
Jan 21 04:27:19 zfstest kernel: [21724.708691]  [<ffffffff810a0b30>] ? kthread_create_on_node+0x1e0/0x1e0
Jan 21 04:27:19 zfstest kernel: [21724.708693]  [<ffffffff8183888f>] ret_from_fork+0x3f/0x70
Jan 21 04:27:19 zfstest kernel: [21724.708695]  [<ffffffff810a0b30>] ? kthread_create_on_node+0x1e0/0x1e0
@behlendorf behlendorf added the Component: Memory Management kernel memory management label Jan 23, 2017
@behlendorf
Copy link
Contributor

If you haven't done this already, as a work around I'd recommend trying the xen tunings described in the FAQ. Both xen and zfs put an above average amount of stress on the Linux VM. The good news is that we've done a huge amount of work to improve things on the zfs side, those improvements will first appear in the 0.7.0 major release.

Pre-releases are available which contain the needed improvements, and if you're in a position where you can test the pre-release and we'd love independent confirmation. Thanks.

@alandoyle
Copy link

alandoyle commented Jan 31, 2017

Looking at your System Info I see you're running kernel 4.4.0-59 on Ubuntu 16.04 which has a confirmed OOM-Killer bug see https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1655842

Looks like there are 3 options ...

  1. Roll back to kernel 4.4.0-57
  2. Install beta kernel via linux-headers-4.4.0-62_4.4.0-62.83_all.deb from https://launchpad.net/ubuntu/xenial/amd64/linux-headers-4.4.0-62/4.4.0-62.83 and linux-headers-4.4.0-62-generic_4.4.0-62.83_amd64.deb, linux-image-4.4.0-62-generic_4.4.0-62.83_amd64.deb and linux-image-extra-4.4.0-62-generic_4.4.0-62.83_amd64.deb from http://people.canonical.com/~cascardo/lp1655842/ as per the Launchpad bug report.
  3. Wait 3 weeks until it's officially fixed with an updated kernel

If you do go with option 2 then put the 4 files in the same directory and type sudo dpkg -i *.deb then sudo update-grub2 reboot and you're good to go. This is the route I've taken as the OOM Killer was also nuking all my KVM machines!

UPDATE: This kernel is now available in the proposed repository (https://wiki.ubuntu.com/Testing/EnableProposed). FYI I've been running this now for 2 days on a 8Gb system with 4Tb RAIDZ1 ZFS Pool and a few KVM and LXD virtual machines running and all the OOM problems are gone.

@stale
Copy link

stale bot commented Aug 25, 2020

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Aug 25, 2020
@stale stale bot closed this as completed Nov 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Memory Management kernel memory management Status: Stale No recent activity for issue
Projects
None yet
Development

No branches or pull requests

3 participants