Summary: Replacing faulty disk in ZFS pool

From: Andreas Höschler <ahoesch_at_smartsoft.de>
Date: Fri Aug 07 2009 - 07:03:42 EDT
Dear managers,

thanks to

Jordan Schwartz <jordan247@gmail.com>
DRoss-Smith@reviewjournal.com

and especially to

Cindy.Swearingen@Sun.COM

from the zfs-discuss@opensolaris.org list.

The bottom line is, that my two approaches (see below) should work.
Cindy even suggested a third one.

zpool offline tank c1t6d0
<physically replace c1t6d0 with a new one>
zpool replace tank c1t6d0
zpool online tank c1t6d0

I finally took the following route:

	zpool add tank spare c1t15d0
	zpool replace tank c1t6d0 c1t15d0

This gave me

  scrub: resilver in progress for 0h0m, 7.93% done, 0h10m to go
config:
         NAME           STATE     READ WRITE CKSUM
         tank           DEGRADED     0     0     0
           mirror       ONLINE       0     0     0
             c1t2d0     ONLINE       0     0     0
             c1t3d0     ONLINE       0     0     0
           mirror       ONLINE       0     0     0
             c1t5d0     ONLINE       0     0     0
             c1t4d0     ONLINE       0     0     0
           mirror       DEGRADED     0     0     0
             spare      DEGRADED     0     0     0
               c1t6d0   FAULTED      0    19     0  too many errors
               c1t15d0  ONLINE       0     0     0
             c1t7d0     ONLINE       0     0     0
         spares
           c1t15d0      INUSE     currently in use

After the resilvering process completed, I did

	zpool detach tank c1t6d0

This gave me

pool: tank
  state: ONLINE
  scrub: resilver completed after 0h22m with 0 errors on Thu Aug  6
22:55:37 2009
config:

         NAME         STATE     READ WRITE CKSUM
         tank         ONLINE       0     0     0
           mirror     ONLINE       0     0     0
             c1t2d0   ONLINE       0     0     0
             c1t3d0   ONLINE       0     0     0
           mirror     ONLINE       0     0     0
             c1t5d0   ONLINE       0     0     0
             c1t4d0   ONLINE       0     0     0
           mirror     ONLINE       0     0     0
             c1t15d0  ONLINE       0     0     0
             c1t7d0   ONLINE       0     0     0

errors: No known data errors

and thus a calm night! :-)

A replacement disk is one its way!

Thanks a lot,

  Andreas



Original Question:
============================================
> Dear managers,
>
> one of our servers (X4240) shows a faulty disk:
>
> -----------------------------------------------------------------------
> -
> -------------------------------------------------
> -bash-3.00# zpool status
>   pool: rpool
>  state: ONLINE
>  scrub: none requested
> config:
>
>         NAME          STATE     READ WRITE CKSUM
>         rpool         ONLINE       0     0     0
>           mirror      ONLINE       0     0     0
>             c1t0d0s0  ONLINE       0     0     0
>             c1t1d0s0  ONLINE       0     0     0
>
> errors: No known data errors
>
>   pool: tank
>  state: DEGRADED
> status: One or more devices are faulted in response to persistent
> errors.
>         Sufficient replicas exist for the pool to continue functioning
> in a
>         degraded state.
> action: Replace the faulted device, or use 'zpool clear' to mark the
> device
>         repaired.
>  scrub: none requested
> config:
>
>         NAME        STATE     READ WRITE CKSUM
>         tank        DEGRADED     0     0     0
>           mirror    ONLINE       0     0     0
>             c1t2d0  ONLINE       0     0     0
>             c1t3d0  ONLINE       0     0     0
>           mirror    ONLINE       0     0     0
>             c1t5d0  ONLINE       0     0     0
>             c1t4d0  ONLINE       0     0     0
>           mirror    DEGRADED     0     0     0
>             c1t6d0  FAULTED      0    19     0  too many errors
>             c1t7d0  ONLINE       0     0     0
>
> errors: No known data errors
> -----------------------------------------------------------------------
> -
> -------------------------------------------------
> I derived the following possible approaches to solve the problem:
>
> 1) A way to reestablish redundancy would be to use the command
>
>        zpool attach tank c1t7d0 c1t15d0
>
> to add c1t15d0 to the virtual device "c1t6d0 + c1t7d0". We still would
> have the faulty disk in the virtual device.
>
> We could then dettach the faulty disk with the command
>
>        zpool dettach tank c1t6d0
>
> 2) Another approach would be to add a spare disk to tank
>
>        zpool add tank spare c1t15d0
>
> and the replace to replace the faulty disk.
>
>        zpool replace tank c1t6d0 c1t15d0
>
> In theory that is easy, but since I have never done that and since this
> is a productive server I would appreciate if somone with more
> experience would look on my agenda before I issue these commands.
>
> What is the difference between the two approaches? Which one do you
> recommend? And is that really all that has to be done or am I missing a
> bit? I mean can c1t6d0 be physically replaced after issuing "zpool
> dettach tank c1t6d0" or "zpool replace tank c1t6d0 c1t15d0"? I also
> found the command
>
>        zpool offline tank  ...
>
> but am not sure whether this should be used in my case. Hints are
> greatly appreciated!
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
Received on Fri Aug 7 07:05:07 2009

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:44:14 EST