PDA

View Full Version : ARGH


Dr. Z
19-01-2007, 01:06
I got a call at about 6pm tonight - the shock radio webserver is no longer serving ANY pages (apache running on gentoo). Despite the fact that I am one of two people the chief eng can contact that knows about Linux, I am so far lacking in root access (or any shell access at all) to the box - so I get on MSN to someone who does and start the process of finding out whats wrong.

It accepted an SSH session but the box was apparrently shutting down ... hmm. Wait a few minutes and ssh back in - everything is royally buggered, services up, down, left, right who knows what. dmesg indicated a problem with the fileserver. I get my coat on....

Get to the studio, walk into the server room to find that the webserver is halfway down (refusing logins but still multi-user) with tty1 just showing its progress through shutting down...

The fileserver isnt much healthier - RAID5 in degraded mode. Now, if we were running something GOOD this wouldnt be too much of an issue but we have a crap rocketraid sata controller with freenas on top. Its obviously spat at least one disk but I hear from the station manager that "it stopped working yesterday, then came back so I thought it was fine" - now FreeNAS is totally unresponsive, the web interface is down, the box wont reboot and god only knows what. The thing is, FreeNAS appears to be buggered but it can see individual drives (ad2, 4 and 6) and THEN an array ar0 rather than just seeing one array. Is it hardware raid? The raid card thinks so. Is it software raid? FreeNAS thinks so.

The webserver died because for some reason, something critical is in a folder on the fileserver. Whats the point in that? WHY? It introduces so many ways to bring the web server down its unfunny. The webserver isnt the only thing that runs on that box either, so its not like it crashing only takes out our web presence!

The data on this array has NEVER been backed up and is not only irreplacable but required to be kept by law. The penalty for losing this stuff is a fairly hefty fine and/or imprisonment. Quite why they felt using GOD DAMN MAXTOR DRIVES for this I dont quite know. F****** ridiculous.

Going to rebuild the array tomorrow hopefully (!) and promptly make a backup. Thankfully its "only" 400Gb :/

Just needed a bit of a rant...

Davey_Pitch
19-01-2007, 02:11
Given that it's *Nix based I only understood half of that post, but I get the general gist that things went FUBAR big time. Hope you can recover everything as quick as possible, though I'm getting the impression it won't be a quick or easy job.

Daz
19-01-2007, 10:44
Is it hardware raid? The raid card thinks so. Is it software raid? FreeNAS thinks so.
It's probably a so called "Fakeraid" card, which implements RAID functions at a driver or software level. Most onboard SATA controllers that tout RAID functionality are actually like this - basically no better than OS level RAID, your CPU does it.

Depending on how that box has been setup, you might need to get familiar with dmraid (http://www.linuxmanpages.com/man8/dmraid.8.php). If the array is handled by dmraid (and device mapper), then you should see something other than 'control' in /dev/mapper, if it even exists.

Either way, good look. Touch wood (/touches wood), I've never really had to do much recovery wise on a Linux box, most *nix admins I've worked with (if it isnt myself anyway) keep good backup procedures, so it's always been restoring rather than recovering.

Justsomebloke
19-01-2007, 10:47
Although i read your O.P. yeterday and today i still have no idea what the **** you are on about.
That doesn't stop me feeling your pain though so good luck with sorting it all out today.

Dr. Z
19-01-2007, 17:05
It's probably a so called "Fakeraid" card, which implements RAID functions at a driver or software level. Most onboard SATA controllers that tout RAID functionality are actually like this - basically no better than OS level RAID, your CPU does it.

Depending on how that box has been setup, you might need to get familiar with dmraid (http://www.linuxmanpages.com/man8/dmraid.8.php). If the array is handled by dmraid (and device mapper), then you should see something other than 'control' in /dev/mapper, if it even exists.

The RAID card has a rebuild function so I would take it from that its hardware raid? Thing is, if it isnt and it is indeed software raid, the fact the OS just wont boot might be a problem when it comes to getting it back? :p

Tomorrow I will get the money for a new drive so I will try it and see.

Im touching your wood daz, as I only have fake stuff to hand here :p

lostkat
19-01-2007, 17:37
Im touching your wood daz, as I only have fake stuff to hand here :p:shocked: Does Dee know about this? ;)

Daz
19-01-2007, 17:39
The RAID card has a rebuild function so I would take it from that its hardware raid?
Not neccesarily. Copying bit for bit takes no logic, though if it is indeed a RAID 5 array then it's possible we're dealing with the real deal. Google suggests it could be (http://www.google.co.uk/search?q=rocketraid+fakeraid&ie=utf-8&oe=utf-8&rls=org.mozilla:en-GB:official&client=firefox-a) fakeraid, dont suppose you know the chipset number off the top of your head?

It could of course just be traditional Linux software raid, in which case you'll see /dev/md0,md1 etc (multi-disk devices). This would be easier in terms of not worrying about dmraid (it can be a bitch believe me), but I'm not sure how it'll help in recovering a RAID-5. It should be well documented though if it is this particular flavour.

Assuming fakeraid, if it's entirely handled by the driver (unlikely, most 2.4 kernels and certainly 2.6 see straight through them to the raw devices) then who knows what you'll have to do.

Thing is, if it isnt and it is indeed software raid, the fact the OS just wont boot might be a problem when it comes to getting it back? :p
That might actually help you - boot into some sort of live CD environment with support for whatever software RAID it might be and see what you're left with. If a hardware controller died you'd need another one identical to it - so in that way software has 'an' advantage.

Im touching your wood daz
Blimey, bit friendly aint ya ;D

Daz
19-01-2007, 17:48
I'm reading there are some proprietary binary drivers kicking about for RocketRAID cards, and you may well be using them. Could do with knowing he chipset model really to find out what we're dealing with :)

HighPoint RocketRAID 1540/1542/1544/1640 & 454 (HPT374 chipset), RocketRAID 1520 (HPT372, HPT372N, or HPT372A chipset), and Rocket 1520 (HPT302N chipset — non-RAID) PCI cards — fakeraid. Supported by drivers/ide's hpt36x driver, by at latest 2.4.21-pre5. No libata driver exists for these, but Alan Cox is working on one (as of 2006-01). Note: Some recent HighPoint cards use Marvell 88SX50xx chips (for which see separate driver info). Problematic proprietary Linux i386 binary drivers for HighPoint fakeraid (release 2.0 of which is reported to malfunction or even fail to compile on later 2.6 kernels ranging, at least, from 2.6.8 through 2.6.14) are available, but, as usual, you're better off using Linux's own open-source "md" software-RAID driver. (Warning: You'll need to load the proprietary driver only into kernels lacking the conflicting drivers/ide htp36x driver, in the presence of which your system will seize up, at boot time.)
Should find it on this site anyway:

http://linuxmafia.com/faq/Hardware/sata.html

Dr. Z
20-01-2007, 20:05
Its a 1640.

I now have a drive to replace the failed one so I am good to go for the new suggestions!