FreeBSD + gmirror = defence against hard drive failures
October 2018
Sun Mon Tue Wed Thu Fri Sat
  1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31      
About
This site is an effort to share some of the base knowledge I have gathered through all this years working with Linux, FreeBSD, OpenBSD, Python or Zope, among others. So, take a look around and I hope you will find the contents useful.
Recent Entries
Recent Comments
Recent Trackbacks
Categories
OpenBSD (9 items)
BSD (0 items)
FreeBSD (19 items)
Linux (3 items)
Security (3 items)
Python (22 items)
Zope (13 items)
Daily (144 items)
e-shell (9 items)
Hacks (14 items)
PostgreSQL (3 items)
OSX (8 items)
Nintendo DS (0 items)
enlightenment (0 items)
Apache (3 items)
Nintendo Wii (1 items)
Django (24 items)
Music (12 items)
Plone (7 items)
Varnish (0 items)
Lugo (2 items)
Sendmail (0 items)
europython (7 items)
Cherokee (1 items)
self (1 items)
Nature (1 items)
Hiking (0 items)
uwsgi (0 items)
nginx (0 items)
cycling (10 items)
Networking (1 items)
DNS (0 items)
Archives

Syndicate this site (XML)

RSS/RDF 0.91

17 marzo
2009

FreeBSD + gmirror = defence against hard drive failures

this is probably one of the most awesome tests I've ever done

The poweredge 1800, beautiful, isn't it?

Some days ago I've bought a used DELL poweredge 1800 server (yes, another one after the 6650). This new server will replace the one where this site currently runs on. It has 1x2.8Ghz 64bit Intel Dual Xeon processor, 2Gb DDRII RAM and 3x73Gb UltraSCSI 360 hard drives (10000rpm) attached to an Adaptec 39160 SCSI card.

Last weekend I installed FreeBSD (what else?) 7.1 (amd64) on it, using the first scsi hd (da0). A minimal install was enough, then I followed the usual process and I did update it to 7_STABLE (creating my own customized kernel configuration file).

The poweredge 1800, without the frontal protector

Once the box was completely up-to-date (I even got a synced version of the ports tree using cvsup) I set up a gmirror following this chapter from the FreeBSD Handbook. Gmirror is part of the new GEOM framework, which let admins manage things like software RAID easily (I recommend you to take some time to read that chapter from the handbook, it is really worth a read).

One of the shiny features from gmirror is that it allows us to create a RAID-1 mirror of our main hard drive (that is, the drive from where FreeBSD boots) and it was supposed to manage hard drive failures so if one of the disks attached to the mirror fails, the system should be up-and-running as if nothing happened.

So, there I was, with FreeBSD and a complete gmirror (I had to wait until the mirroring process was complete) with 2 hot-plug SCSI drives... could it be a better environment to test gmirror and see how it performs against a harddrive failure? :D

unplugging the disk from a live system

The test was pretty easy to perform. I just pick up one of the disks (da0, the one from where the gmirror was build) and I pull it from the hot-plug SCSI card...

...

And the system kept itself running smoothly!, the only notice about the fact that the disk was unplugged appeared in the /var/log/messages log file:

Mar 15 12:51:20 nidhogg kernel: ahc0: Someone reset channel A
Mar 15 12:51:31 nidhogg kernel: (da1:ahc0:0:1:0): WRITE(10). CDB: 2a 0 8 8b b9 39 0 0 1 0
Mar 15 12:51:31 nidhogg kernel: (da1:ahc0:0:1:0): CAM Status: SCSI Status Error
Mar 15 12:51:31 nidhogg kernel: (da1:ahc0:0:1:0): SCSI Status: Check Condition
Mar 15 12:51:31 nidhogg kernel: (da1:ahc0:0:1:0): UNIT ATTENTION asc:29,2
Mar 15 12:51:31 nidhogg kernel: (da1:ahc0:0:1:0): SCSI bus reset occurred
Mar 15 12:51:31 nidhogg kernel: (da1:ahc0:0:1:0): Retrying Command (per Sense Data)
Mar 15 12:51:31 nidhogg kernel: (da0:ahc0:0:0:0): lost device
Mar 15 12:51:31 nidhogg kernel: (da0:ahc0:0:0:0): Invalidating pack
Mar 15 12:51:31 nidhogg kernel: GEOM_MIRROR: Cannot write metadata on da0 (device=gm0, error=6).
Mar 15 12:51:31 nidhogg kernel: GEOM_MIRROR: Cannot update metadata on disk da0 (error=6).
Mar 15 12:51:31 nidhogg kernel: GEOM_MIRROR: Device gm0: provider da0 disconnected.
Mar 15 12:51:31 nidhogg kernel: (da0:ahc0:0:0:0): Synchronize cache failed, status == 0x4a, scsi status == 0x0
Mar 15 12:51:31 nidhogg kernel: (da0:ahc0:0:0:0): removing device entry

The system is advicing us about some problems in the ahc0 controller (the adaptec scsi card), as it seems that one of the disks attached to it isn't there anymore. Then GEOM is advicing us too, about a fail when trying to write metadata to one of the disks attached to the mirror (but it kept itself running fine, using only the other disk).

The first part of the test was completely successfull, but I still had to check if the server was able to reboot without the first hard drive attached to it, and indeed it did!. I rebooted the box without the first scsi disk and everything booted up fine, the system was up-and-running in a matter of seconds.

plugging back the disk into a live system

To end my tests, I just plugged the drive back in the scsi hot-plug card, just to notice some more information in /var/log/messages:

Mar 15 12:56:25 nidhogg kernel: ahc0: Someone reset channel A
Mar 15 12:56:30 nidhogg kernel: (da0:ahc0:0:1:0): WRITE(10). CDB: 2a 0 8 8b b9 39 0 0 1 0
Mar 15 12:56:30 nidhogg kernel: (da0:ahc0:0:1:0): CAM Status: SCSI Status Error
Mar 15 12:56:30 nidhogg kernel: (da0:ahc0:0:1:0): SCSI Status: Check Condition
Mar 15 12:56:30 nidhogg kernel: (da0:ahc0:0:1:0): UNIT ATTENTION asc:29,2
Mar 15 12:56:30 nidhogg kernel: (da0:ahc0:0:1:0): SCSI bus reset occurred
Mar 15 12:56:30 nidhogg kernel: (da0:ahc0:0:1:0): Retrying Command (per Sense Data)

And the drive was back!.

One thing to take care of is the fact that using the gmirror status command I noticed that the disk wasn't recognized by gmirror, so I had to tell the mirror to forget it's associates and re-add the disk to it:

gmirror forget gm0
gmirror insert gm0 /dev/da0

This lead the mirror to re-sync the da0 device with the current mirror (which is somehow understandable, as the system kept running and writing data to da1 while da0 wasn't connected at all).

So, successfull test!. Now I've a hard-drive-failure tolerant server.

Just to end this post, two comments:

1- gmirror comes with some commands (like status or list) that will be useful to get some information about the mirror itself but another tool to gather information (this time about disk usage/activity) is systat. The command:

systat -iostat 2

will show you information about your disks usage in a top-like interactive way:

          /0%  /10  /20  /30  /40  /50  /60  /70  /80  /90  /100
da0   MB/sXXXXXXXXXXXXXXXXXXX
      tps|XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX613.69
da1   MB/sXXXXXXXXXXXXXXXXXXX
      tps|XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX609.70

(check the man page of systat to learn more about it)

2- I did not stress tests/performance benchmarks on the mirror (will be the next test), but it would be nice to check if there are any performance issues using the mirror instead only one of the disks (and, if there are any, measure if such performance penalty is acceptable knowing the fact that the system will be running anyway if one of the drives crashes).

Posted by wu at 08:19 | Comments (0) | Trackbacks (0)
<< Meme: Día de tu primer post | Main | Que lio tiene la gente con los refranes... >>
Comments
Re: FreeBSD + gmirror = defence against hard drive failures

Must I setup first the RAID-1 Hardware Configuration?

Posted by: Hans at junio 10,2010 03:28
Re: FreeBSD + gmirror = defence against hard drive failures

Hi Hans, I don't know exactly what did you mean with that question, but I recommend you to take a look at this chapter in the FreeBSD handbook:

http://www.freebsd.org/doc/en/books/handbook/geom-mirror.html

It has a lot of information on how to configure gmirror correctly

Posted by: Wu at junio 10,2010 16:29
Trackbacks
Please send trackback to:http://blog.e-shell.org/151/tbping
There are no trackbacks.
Post a comment