andkorn.org

a fine line between curiosity and madness.

Rebuilding a Linux Software RAID With Mdadm

First off, let me say that no one in their right mind should ever use a software RAID. Never! At least not in a production environment. You can do it at home all you want or if you really hate yourself and want to support this stuff yourself and deal with the headaches. Ina real environment if you have a need for RAID, pony up the money and get a real hardware RAID controller. Like I like to say, if you want to play with the big boys you need big-boy toys.

In any case, I ran across a DIY setup of Openfiler used as an ISCSI target. While Openfiler looks like a great system, I would never use it in a production environment unless the company has purchased support from Openfiler. Unless, of course the system was non-critical. I never want someone breathing down my neck because their systems are down and there is something wrong inside our DIY SAN… Oh, and this Openfiler system also has not been updated in quite a few years.

Inside the system there were 4 SATA disks, each 1TB in size. On each disk there were 4 partitions, and each set of partitions was one MD device (more on this later). One disk had failed (as was visible from smartctl) and my md3 had gone bonkers and “forgot” about all the other disks in its RAID5 array…

1
kernel: Buffer I/O error on device md3, logical block 0

The above errors were also followed by a lot of errors complaining that /dev/sdc was resetting and acting funky. Unfortunately I didn’t save the logs about /dev/sdc. After seeing the errors in dmesg, and looking at all the outputs of mdadm —misc —detail /dev/mdX, I noted the 4 drives were /dev/sda, sdb, sdc, sdd. The dead one was sdc. Just for kicks I looked at the SMART output of all the disks like so:

1
smartctl -a /dev/sdX |less

And when I got to sdc I found:

1
2
3
4
smartctl -a /dev/sdc |less

 === START OF READ SMART DATA SECTION ===
 SMART overall-health self-assessment test result: FAILED

If you look at the very top of smartctl’s output you will see the model and serial number of the drive. Now you can remove the offending physical drive and replace it with a spare. After replacing and of course a reboot (because software RAID is not hot-swapable) let’s do some diagnostics:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
[root@NAS ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid10] [raid1]
md2 : active raid5 sdd2[3] sdb2[2] sda2[0]
      2047488 blocks level 5, 256k chunk, algorithm 2 [4/3] [U_UU]


md1 : active raid5 sdd3[3] sdb3[2] sda3[0]
      2047488 blocks level 5, 256k chunk, algorithm 2 [4/3] [U_UU]


md3 : inactive sdc4[0]
      975289984 blocks


md0 : active raid1 sdd1[3] sdb1[2] sda1[0]
      104320 blocks [4/3] [U_UU]


unused devices: <none>
[root@NAS ~]# mdadm --misc --detail /dev/md0
/dev/md0:
        Version : 00.90.03
  Creation Time : Mon Jun 22 13:02:20 2009
     Raid Level : raid1
     Array Size : 104320 (101.89 MiB 106.82 MB)
  Used Dev Size : 104320 (101.89 MiB 106.82 MB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 0
    Persistence : Superblock is persistent


    Update Time : Wed Jul 25 04:02:41 2012
          State : clean, degraded
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0


           UUID : e0068cb2:4b28ece0:acb7027f:d0e9fe04
         Events : 0.300


    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       0        0        1      removed
       2       8       17        2      active sync   /dev/sdb1
       3       8       49        3      active sync   /dev/sdd1
[root@NAS ~]# mdadm --misc --detail /dev/md1
/dev/md1:
        Version : 00.90.03
  Creation Time : Mon Jun 22 13:02:17 2009
     Raid Level : raid5
     Array Size : 2047488 (1999.84 MiB 2096.63 MB)
  Used Dev Size : 682496 (666.61 MiB 698.88 MB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 1
    Persistence : Superblock is persistent


    Update Time : Tue Jul 24 17:49:28 2012
          State : clean, degraded
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0


         Layout : left-symmetric
     Chunk Size : 256K


           UUID : 2aa930c4:81a86e80:779bb96e:10f00225
         Events : 0.12


    Number   Major   Minor   RaidDevice State
       0       8        3        0      active sync   /dev/sda3
       1       0        0        1      removed
       2       8       19        2      active sync   /dev/sdb3
       3       8       51        3      active sync   /dev/sdd3
[root@NAS ~]# mdadm --misc --detail /dev/md2
/dev/md2:
        Version : 00.90.03
  Creation Time : Mon Jun 22 13:02:17 2009
     Raid Level : raid5
     Array Size : 2047488 (1999.84 MiB 2096.63 MB)
  Used Dev Size : 682496 (666.61 MiB 698.88 MB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 2
    Persistence : Superblock is persistent


    Update Time : Wed Jul 25 09:02:51 2012
          State : clean, degraded
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0


         Layout : left-symmetric
     Chunk Size : 256K


           UUID : e2155b4d:3c6de8f8:1daf70df:e11f16e4
         Events : 0.397292


    Number   Major   Minor   RaidDevice State
       0       8        2        0      active sync   /dev/sda2
       1       0        0        1      removed
       2       8       18        2      active sync   /dev/sdb2
       3       8       50        3      active sync   /dev/sdd2
[root@NAS ~]# mdadm --misc --detail /dev/md3
/dev/md3:
        Version : 00.90.03
  Creation Time : Fri Apr 10 14:13:18 2009
     Raid Level : raid5
  Used Dev Size : 975289984 (930.11 GiB 998.70 GB)
   Raid Devices : 4
  Total Devices : 1
Preferred Minor : 3
    Persistence : Superblock is persistent


    Update Time : Sun Jun 24 06:22:15 2012
          State : active, degraded, Not Started
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0


         Layout : left-symmetric
     Chunk Size : 64K


           UUID : 3b78b4b1:737b20b7:bde21cba:08717914
         Events : 0.93039


    Number   Major   Minor   RaidDevice State
       0       8       36        0      active sync   /dev/sdc4
       1       0        0        1      removed
       2       0        0        2      removed
       3       0        0        3      removed

As you can see, all the other RAID5’s lost just their /dev/sdcX devices, while /dev/md3 lost all its other drives but thinks that /dev/sdc4 is the only member. Let me just add here that the new sdc is from another openfiler NAS that had a motherboard failure. This other NAS had the same exact partition layout as our NAS with the dead drive. So we need to fix up our md3 RAID5… but first we need to add the new disk to the healthy(ish) RAID5 arrays:

1
2
3
4
5
6
7
8
[root@NAS ~]# mdadm /dev/md0 --add /dev/sdc1
 mdadm: added /dev/sdc1

 [root@NAS ~]# mdadm /dev/md1 --add /dev/sdc3
 mdadm: added /dev/sdc3

 [root@NAS ~]# mdadm /dev/md2 --add /dev/sdc2
 mdadm: added /dev/sdc2

Now you can see that md0-2 are looking good:

1
2
3
4
5
6
7
8
9
10
11
12
[root@NAS ~]# cat /proc/mdstat
 [...]
 md2 : active raid5 sdc2[1] sdd2[3] sdb2[2] sda2[0]
    2047488 blocks level 5, 256k chunk, algorithm 2 [4/4] [UUUU]


 md1 : active raid5 sdc3[1] sdd3[3] sdb3[2] sda3[0]
    2047488 blocks level 5, 256k chunk, algorithm 2 [4/4] [UUUU]


 md0 : active raid1 sdc1[1] sdd1[3] sdb1[2] sda1[0]
    104320 blocks [4/4] [UUUU]

Note the 4 U’s : [UUUU] Before this looked like [UU_U] which means that one disk was out of order.

Now to fix up our md3 RAID5.

Step 1 – stop md3.

1
2
mdadm --stop /dev/md3
 mdadm: stopped /dev/md3

Step 2) Re-assemble the RAID from the disks that I know have the proper data on them…

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
[root@NAS ~]# mdadm --assemble --run --force /dev/md3 /dev/sda4 /dev/sdb4 /dev/sdd4
 mdadm: /dev/md3 has been started with 3 drives (out of 4).
 [root@NAS ~]# cat /proc/mdstat
 Personalities : [raid6] [raid5] [raid4] [raid10] [raid1]
 md3 : active raid5 sda4[0] sdd4[3] sdb4[1]
    2925869952 blocks level 5, 64k chunk, algorithm 2 [4/3] [UU_U]


 md2 : active raid5 sdc2[1] sdd2[3] sdb2[2] sda2[0]
    2047488 blocks level 5, 256k chunk, algorithm 2 [4/4] [UUUU]


 md1 : active raid5 sdc3[1] sdd3[3] sdb3[2] sda3[0]
    2047488 blocks level 5, 256k chunk, algorithm 2 [4/4] [UUUU]


 md0 : active raid1 sdc1[1] sdd1[3] sdb1[2] sda1[0]
    104320 blocks [4/4] [UUUU]


 unused devices: <none>

Look at that! it’s back online. Step 3) add the foreign disk into our RAID. Note that the RAID is now rebuilding.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
[root@NAS ~]# mdadm /dev/md3 --add /dev/sdc4
 mdadm: added /dev/sdc4
 [root@NAS ~]# cat /proc/mdstat
 Personalities : [raid6] [raid5] [raid4] [raid10] [raid1]
 md3 : active raid5 sdc4[4] sda4[0] sdd4[3] sdb4[1]
    2925869952 blocks level 5, 64k chunk, algorithm 2 [4/3] [UU_U]
    [>....................]  recovery =  0.0% (86652/975289984) finish=187.4min speed=86652K/sec


 md2 : active raid5 sdc2[1] sdd2[3] sdb2[2] sda2[0]
    2047488 blocks level 5, 256k chunk, algorithm 2 [4/4] [UUUU]


 md1 : active raid5 sdc3[1] sdd3[3] sdb3[2] sda3[0]
    2047488 blocks level 5, 256k chunk, algorithm 2 [4/4] [UUUU]


 md0 : active raid1 sdc1[1] sdd1[3] sdb1[2] sda1[0]
    104320 blocks [4/4] [UUUU]


 unused devices: <none>
 [root@NAS ~]# cat /proc/mdstat
 Personalities : [raid6] [raid5] [raid4] [raid10] [raid1]
 md3 : active raid5 sdc4[4] sda4[0] sdd4[3] sdb4[1]
    2925869952 blocks level 5, 64k chunk, algorithm 2 [4/3] [UU_U]
    [>....................]  recovery =  1.9% (18889604/975289984) finish=326.1min speed=48868K/sec


 md2 : active raid5 sdc2[1] sdd2[3] sdb2[2] sda2[0]
    2047488 blocks level 5, 256k chunk, algorithm 2 [4/4] [UUUU]


 md1 : active raid5 sdc3[1] sdd3[3] sdb3[2] sda3[0]
    2047488 blocks level 5, 256k chunk, algorithm 2 [4/4] [UUUU]


 md0 : active raid1 sdc1[1] sdd1[3] sdb1[2] sda1[0]
    104320 blocks [4/4] [UUUU]


 unused devices: <none>

That’s all. Now we are just waiting for the rebuild to complete to see if all the data is there. I will update this when the rebuild is done. UPDATE: We still lost data but were able to recover some stuff. you may have better luck than I did.

See also

This man’s wonderful write-up on a failed RAID1. Without this article I would’ve been completely in the dark. http://aplawrence.com/Linux/rebuildraid.html

Comments