Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASSERT(zio->io_vd == NULL && zio->io_bp != NULL); #602

Closed
behlendorf opened this issue Mar 14, 2012 · 5 comments
Closed

ASSERT(zio->io_vd == NULL && zio->io_bp != NULL); #602

behlendorf opened this issue Mar 14, 2012 · 5 comments
Milestone

Comments

@behlendorf
Copy link
Contributor

Observed under RHEL6.2 during module load with full zfs debugging enabled. The assertion may be incorrect, it's not clear why zio->io_vd must be NULL here.

SPLError: 24490:0:(zio.c:2893:zio_done()) ASSERTION(zio->io_vd == NULL && zio->io_bp != NULL) failed

PID: 24490  TASK: ffff88004b72aa80  CPU: 7   COMMAND: "z_rd_int/7"
 #0 [ffff88006b4a9c20] schedule at ffffffff814ecac0
 #1 [ffff88006b4a9ce8] spl_debug_bug at ffffffffa0665ced [spl]
 #2 [ffff88006b4a9d18] zio_done at ffffffffa0820808 [zfs]
 #3 [ffff88006b4a9da8] zio_execute at ffffffffa081c913 [zfs]
 #4 [ffff88006b4a9e08] taskq_thread at ffffffffa066f9e6 [spl]
 #5 [ffff88006b4a9ee8] kthread at ffffffff81090726
 #6 [ffff88006b4a9f48] kernel_thread at ffffffff8100c14a

p *(struct zio *)0xffff88005526e7a0
$8 = {
  io_bookmark = {
    zb_objset = 21, 
    zb_object = 0, 
    zb_level = -2, 
    zb_blkid = 1
  }, 
  io_prop = {
    zp_checksum = ZIO_CHECKSUM_INHERIT, 
    zp_compress = ZIO_COMPRESS_INHERIT, 
    zp_type = DMU_OT_NONE, 
    zp_level = 0 '\000', 
    zp_copies = 0 '\000', 
    zp_dedup = 0 '\000', 
    zp_dedup_verify = 0 '\000'
  }, 
  io_type = ZIO_TYPE_READ, 
  io_child_type = ZIO_CHILD_VDEV, 
  io_cmd = 0, 
  io_priority = 0 '\000', 
  io_reexecute = 0 '\000', 
  io_state = "\001", 
  io_txg = 401665, 
  io_spa = 0xffff8800517a2000, 
  io_bp = 0xffff88005526e800, 
  io_bp_override = 0x0, 
  io_bp_copy = {
    blk_dva = {{
        dva_word = {8589934600, 3941687}
      }, {
        dva_word = {0, 0}
      }, {
        dva_word = {0, 0}
      }}, 
    blk_prop = 9225915215840215047, 
    blk_pad = {0, 0}, 
    blk_phys_birth = 0, 
    blk_birth = 401665, 
    blk_fill = 0, 
    blk_cksum = {
      zc_word = {17434539745624380358, 3610462034644144359, 21, 1}
    }
  }, 
  io_parent_list = {
  io_walk_link = 0x0, 
  io_logical = 0xffff88005526e7a0, 
  io_transform_stack = 0x0, 
  io_ready = 0, 
  io_done = 0xffffffffa07d1820 , 
  io_private = 0xffff880030117690, 
  io_prev_space_delta = 0, 
  io_bp_orig = {
    blk_dva = {{
        dva_word = {8589934600, 3941687}
      }, {
        dva_word = {0, 0}
      }, {
        dva_word = {0, 0}
      }}, 
    blk_prop = 9225915215840215047, 
    blk_pad = {0, 0}, 
    blk_phys_birth = 0, 
    blk_birth = 401665, 
    blk_fill = 0, 
    blk_cksum = {
      zc_word = {17434539745624380358, 3610462034644144359, 21, 1}
    }
  }, 
  io_data = 0xffffc90006020000, 
  io_orig_data = 0xffffc90006020000, 
  io_size = 4096, 
  io_orig_size = 4096, 
  io_vd = 0xffff880056b96000, 
  io_vsd = 0x0, 
  io_vsd_ops = 0x0, 
  io_offset = 2022338048, 
  io_deadline = 73703526, 
  io_offset_node = {
    avl_child = {0x0, 0x0}, 
    avl_pcb = 1
  }, 
  io_deadline_node = {
    avl_child = {0x0, 0x0}, 
    avl_pcb = 1
  }, 
  io_vdev_tree = 0xffff880056b966a8, 
  io_flags = 394448, 
  io_stage = ZIO_STAGE_DONE, 
  io_pipeline = 2031616, 
  io_orig_flags = 262352, 
  io_orig_stage = ZIO_STAGE_READY, 
  io_orig_pipeline = 2031616, 
  io_delay = 5, 
  io_error = 52, 
  io_child_error = {0, 0, 0, 0}, 
  io_children = {{0, 0}, {0, 0}, {0, 0}, {0, 0}}, 
  io_child_count = 0, 
  io_parent_count = 1, 
  io_stall = 0x0, 
  io_gang_leader = 0x0, 
  io_gang_tree = 0x0, 
  io_executor = 0xffff88004b72aa80, 
  io_waiter = 0x0, 
  io_lock = {
    m = {
      count = {
        counter = 1
      }, 
      wait_lock = {
        raw_lock = {
          slock = 0
        }
      }, 
      wait_list = {
        next = 0xffff88005526eab0, 
        prev = 0xffff88005526eab0
      }, 
      owner = 0x0
    }
  }, 
  io_cv = {
    cv_magic = 879052276, 
    cv_name = 0xffff8800101cb420 "&zio->io_cv", 
    cv_name_size = 12, 
    cv_event = {
      lock = {
        raw_lock = {
          slock = 0
        }
      }, 
      task_list = {
        next = 0xffff88005526eae8, 
        prev = 0xffff88005526eae8
      }
    }, 
    cv_destroy = {
      lock = {
        raw_lock = {
          slock = 0
        }
      }, 
      task_list = {
        next = 0xffff88005526eb00, 
        prev = 0xffff88005526eb00
      }
    }, 
    cv_waiters = {
      counter = 0
    }, 
    cv_mutex = 0x0
  }, 
  io_cksum_report = 0x0, 
  io_ena = 0, 
  io_tqent = {
    tqent_lock = {
      raw_lock = {
        slock = 0
      }
    }, 
    tqent_list = {
      next = 0xffff88005526eb38, 
      prev = 0xffff88005526eb38
    }, 
    tqent_id = 40, 
    tqent_func = 0xffffffffa081c820 , 
    tqent_arg = 0xffff88005526e7a0, 
    tqent_flags = 1
  }
}
@ryao
Copy link
Contributor

ryao commented Mar 15, 2012

How did you enable "full ZFS debugging"? Is it more involved than passing --debug to ./configure?

@behlendorf
Copy link
Contributor Author

Passing --enable-debug to configure gets you to bulk of the debugging. Several other debug checks can be enabled by setting specific zfs_flags in module/zfs/zfs_debug.c. Although I wouldn't recommend it from anything other than a debug build. The debugging is fairly heavy weight in areas and I'm sure it will impact performance.

@ryao
Copy link
Contributor

ryao commented Mar 16, 2012

I am comparing this issue to issue #604 and things don't quite make sense to me. If I understand ./include/sys/zio.h and /usr/src/linux/include/asm-generic/errno.h correctly, io_error = 52 indicates a checksum error. It could be set in any of the following locations, assuming that I did not miss any:

/# grep -rn " = ECKSUM" ./module/
./module/zfs/vdev_raidz.c:1681: rc->rc_error = ECKSUM;
./module/zfs/vdev_raidz.c:1805: rc->rc_error = ECKSUM;
./module/zfs/vdev_raidz.c:2081: zio->io_error = ECKSUM;
./module/zfs/zil.c:216: error = ECKSUM;
./module/zfs/zil.c:230: error = ECKSUM;
./module/zfs/dmu_send.c:1400: ra.err = ECKSUM;

What doesn't make sense is how a checksum failure would occur with a 6-device mirrored vdev like I have in issue #604 and fail to be healed automatically. There is no history of any checksum failures in my pool and a scrub failed to find anything wrong. Furthermore, I perused the FreeBSD code. They have had this assertion in their code for more than 3 years and it doesn't appear that they have done anything to trigger it.

What did you do to get a dump of the zio struct contents? I am not certain if this issue is the same as issue #604. I would like to get a look at zio->io_error to be certain.

@behlendorf
Copy link
Contributor Author

Checksum failures are not particularly uncommon at import time if the pool was not exported cleanly. They're somewhat expected as ZFS attempt to located the correct vdev label. As for why this issue isn't observed under FreeBSD it's my understanding the FreeBSD, Solaris, and Illumos all ship their kernels with assertions disabled, just like the default ZoL build. Unless you go out of your way to rebuild their kernel you'll never see this.

behlendorf added a commit to behlendorf/zfs that referenced this issue Mar 21, 2012
This patch was slightly flawed and allowed for zio->io_logical
to potentially not be reinitialized for a new zio.  This could
lead to assertion failures in specific cases when debugging is
enabled (--enable-debug) and I/O errors are encountered.  It
may also have caused problems when issues logical I/Os.

Since we want to make sure this workaround can be easily removed
in the future (when we have the real fix).  I'm reverting this
change and applying a new version of the patch which includes
the zio->io_logical fix.

This reverts commit 2c6d0b1.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#602
Issue openzfs#604
@behlendorf
Copy link
Contributor Author

Closing issue, I'm convinced this failure is explained and fixed by the zio->io_logical fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants