Help, my disk array has one dead member

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
63 messages Options
1234
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Help, my disk array has one dead member

Kevin O'Gorman
I need a quick bit of advice.

I have a stripe array under mdadm, and one of the disks is dead or dying.  Nevertheless, there's a bunch of stuff that is NOT gone, and I'd like to capture all I can before I even power this thing off.
I'm missing some inodes, as well as some file and directory blocks.

It's all hobby stuff, and it's all rebuildable, but I'd like to capture all the clues I can.

I'm looking for a way to copy the whole directory structure, just omitting any file or directory that reports an error in the process.  I'll do it with find(1) if I have to, but I'm wondering if there's a utility that's easily adapted to this situation.  Partial directories are okay.  Partial files are not.

--
Kevin O'Gorman
#define QUESTION ((bb) || (!bb))   /* Shakespeare */

Please consider the environment before printing this email.


--
ubuntu-users mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Help, my disk array has one dead member

Karl Auer
On Tue, 2017-03-21 at 19:06 -0700, Kevin O'Gorman wrote:
> I have a stripe array under mdadm, and one of the disks is dead or
> dying.
> [...]
> I'm looking for a way to copy the whole directory structure, just
> omitting any file or directory that reports an error in the process.

Get an external drive (USB3.0 or eSATA if you can support that) that is
as large or larger than the RAID's capacity. You will need to format
the drive to ext4 because you will be creating a very large file.

If you have plenty of network attached storage that allows huge files
you can put the destination file there instead - adjust the following
instructions accordingly.

Then use dd to take an image of the failing array; use the option to
skip errors rather than fail on them.

   dd if=/dev/XXX \
      of=/ext/drive/bigfile.dat \
      conv=noerror,fsync \
      bs=10M

...where XX is the SOURCE (your RAID device) and
/ext/drive/bigfile.data is the DESTIANTION - a file on the external
drive (probably /media/whatever). It may take quite a long time.

Do NOT confuse "if" and "of" in the dd command, or you will be most
royally screwed.

If you possibly can, do all this while booted off a live CD, so you can
copy the source without it being active.

Use (e.g.) /dev/sda as the source to get the whole drive, or /dev/sda1
to get just one partition (those are NOT the actual device names you
will need - use the correct device names for your RAID or partitions on
it).

The end result will be a huge file on the external drive. The huge file
will be a block-for-block copy of the failing RAID (or partition if you
went that way). You can mount this on a loopback and look around it at
your leisure. It will be as good as or better than the original, with
no risk of further loss.

Regards, K.

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Karl Auer ([hidden email])
http://www.biplane.com.au/kauer
http://twitter.com/kauer389

GPG fingerprint: A52E F6B9 708B 51C4 85E6 1634 0571 ADF9 3C1C 6A3A
Old fingerprint: E00D 64ED 9C6A 8605 21E0 0ED0 EE64 2BEE CBCB C38B



--
ubuntu-users mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Help, my disk array has one dead member

Liam Proven
On 22 March 2017 at 04:19, Karl Auer <[hidden email]> wrote:
>    dd if=/dev/XXX \


No no no!

Do not use ``dd''. DD is not the tool for data recovery, because it
stops on the first error, and if you try again it will stop at the
same point. What the OP needs is a tool that continues on errors, and
logs where the error occurs so that it doesn't have to retry next
time. This tool is ``ddrescue''.

GNU ddrescue replaces the older gddrescue (or dd_rescue) command.

https://en.wikipedia.org/wiki/Ddrescue

There's a howto here:

https://help.ubuntu.com/community/DataRecovery



--
Liam Proven • Profile: https://about.me/liamproven
Email: [hidden email] • Google Mail/Talk/Plus: [hidden email]
Twitter/Facebook/Flickr: lproven • Skype/LinkedIn/AIM/Yahoo: liamproven
UK: +44 7939-087884 • ČR/WhatsApp/Telegram/Signal: +420 702 829 053

--
ubuntu-users mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Help, my disk array has one dead member

Ralf Mardorf-2
On Wed, 22 Mar 2017 14:50:42 +0100, Liam Proven wrote:
>Do not use ``dd''. DD is not the tool for data recovery, because it
>stops on the first error

I'm not a dd expert, but...

...Karl wrote:
>use the option to skip errors rather than fail on them.

So dd might not necessarily stop on errors.

$ man dd | grep "continue after read errors" -B1
       noerror
              continue after read errors

Regards,
Ralf


--
ubuntu-users mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
Xen
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Help, my disk array has one dead member

Xen
In reply to this post by Liam Proven
Liam Proven schreef op 22-03-2017 14:50:

> On 22 March 2017 at 04:19, Karl Auer <[hidden email]> wrote:
>>    dd if=/dev/XXX \
>
>
> No no no!
>
> Do not use ``dd''. DD is not the tool for data recovery, because it
> stops on the first error, and if you try again it will stop at the
> same point. What the OP needs is a tool that continues on errors, and
> logs where the error occurs so that it doesn't have to retry next
> time. This tool is ``ddrescue''.

That's why he had a noerrors parameter.

(noerror).

Karl Auers advice was spot on if you ask me.

Not saying ddrescue wouldn't work, but the command as constructed would
skip erroring sectors.

--
ubuntu-users mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Help, my disk array has one dead member

Liam Proven
In reply to this post by Ralf Mardorf-2
On 22 March 2017 at 15:07, Ralf Mardorf <[hidden email]> wrote:

> I'm not a dd expert, but...
>
> ...Karl wrote:
>>use the option to skip errors rather than fail on them.
>
> So dd might not necessarily stop on errors.
>
> $ man dd | grep "continue after read errors" -B1
>        noerror
>               continue after read errors


Sure, yes, I did note that, but what if you need to stop it and
restart? No logging or anything.

dd_rescue was designed to replace this _eighteen years ago_ in 1999,
and gddrescue came out to replace dd_rescue with a more flexible
alternative in 2004.

DD hasn't been the tool for this since the end of last century.

--
Liam Proven • Profile: https://about.me/liamproven
Email: [hidden email] • Google Mail/Talk/Plus: [hidden email]
Twitter/Facebook/Flickr: lproven • Skype/LinkedIn/AIM/Yahoo: liamproven
UK: +44 7939-087884 • ČR/WhatsApp/Telegram/Signal: +420 702 829 053

--
ubuntu-users mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Help, my disk array has one dead member

Karl Auer
On Wed, 2017-03-22 at 16:25 +0100, Liam Proven wrote:
> DD hasn't been the tool for this since the end of last century.

Ah well, I'm nothing if not old-fashioned.

dd will work, and with the noerror parameter will not stop on errored reads.

That said, Liam is quite right: gddrescue is the better choice. If you
read the doco you will see why.

   sudo apt-get install gddrescue ddrescueview

The main point for people to take from this discussion is that if you
have a failing filesystem, you should take a copy of it as soon as you
can, then work on the copy. That way you reduce the likelihood of
further data loss.

Regards, K.

-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Karl Auer ([hidden email])
http://www.biplane.com.au/kauer
http://twitter.com/kauer389

GPG fingerprint: A52E F6B9 708B 51C4 85E6 1634 0571 ADF9 3C1C 6A3A
Old fingerprint: E00D 64ED 9C6A 8605 21E0 0ED0 EE64 2BEE CBCB C38B



--
ubuntu-users mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Help, my disk array has one dead member

Bruce Ferrell
In reply to this post by Kevin O'Gorman
On 03/21/2017 07:06 PM, Kevin O'Gorman wrote:

> I need a quick bit of advice.
>
> I have a stripe array under mdadm, and one of the disks is dead or dying.  Nevertheless, there's a bunch of stuff that is NOT gone, and I'd like to capture all I can before I
> even power this thing off.
> I'm missing some inodes, as well as some file and directory blocks.
>
> It's all hobby stuff, and it's all rebuildable, but I'd like to capture all the clues I can.
>
> I'm looking for a way to copy the whole directory structure, just omitting any file or directory that reports an error in the process.  I'll do it with find(1) if I have to, but
> I'm wondering if there's a utility that's easily adapted to this situation.  Partial directories are okay.  Partial files are not.
>
> --
> Kevin O'Gorman
> #define QUESTION ((bb) || (!bb))   /* Shakespeare */
>
> Please consider the environment before printing this email.
>
>
>
Kevin,

Yes, it's a good idea to make a copy of the file system before doing irrevocable things,  but depending on how the RAID was set up it's either needless (this is why we use a RAID)
or irrelevant.

I can say that with good authority because I daily work on appliance systems the use md raids that often fail drives.

Simple raid0 is destroyed with a member drive or partition out so no copy is needed/possible.  raid1, your data is intact on one (or more) of the mirrors and a copy is nice, but
not necessary (tar will work on the file system mounted to the md).

There are some decent how-tos that I use when needed:

http://www.tjansson.dk/2013/12/replacing-a-failed-disk-in-a-mdadm-raid/

http://www.ducea.com/2009/03/08/mdadm-cheat-sheet/

I use the cheat sheet a lot.



--
ubuntu-users mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
Xen
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Help, my disk array has one dead member

Xen
Bruce Ferrell schreef op 22-03-2017 22:36:

> Simple raid0 is destroyed with a member drive or partition out so no
> copy is needed/possible.

He said that one of the drives was damaged, but still readable. The
other was fine.

--
ubuntu-users mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Help, my disk array has one dead member

Bruce Ferrell
On 3/22/17 3:37 PM, Xen wrote:
> Bruce Ferrell schreef op 22-03-2017 22:36:
>
>> Simple raid0 is destroyed with a member drive or partition out so no
>> copy is needed/possible.
>
> He said that one of the drives was damaged, but still readable. The
> other was fine.
>
Yes and he also said it's an md raid.  As I said, the devil is in the
details;  If it's raid1 (or higher) and one of the elements is damaged,
the others cover for that with speed degradation.

raid0, do NOT use any dd variant on the physical disk.  md has internal
data structures on the physical disks and doesn't transplant well to new
drives.

You *might* be able to get away with dd on the md device, but that is
active and managed by the md subsystem.  I was taught NEVER use dd on
active devices... Unless you want trash data.

Use tar and you may have to be selective about what you copy but then
again maybe sector relocation can help you get past the damage for tar.

Where this get's REALLY dicey; damaged data may have been mirrored back
to the good drive from the bad one.  NASTY!

Bottom line, one size never fits all...  poke, prod (gently) and use
trouble shooting steps to make a determination of what's needed to
recover and NEVER blindly follow "just do this..." instructions


--
ubuntu-users mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Help, my disk array has one dead member

Karl Auer
On Wed, 2017-03-22 at 16:12 -0700, Bruce Ferrell wrote:
> Bottom line, one size never fits all...  poke, prod (gently) and use 
> trouble shooting steps to make a determination of what's needed to 
> recover and NEVER blindly follow "just do this..." instructions

Sometimes people have no choice.

Not sure what you are on about.

The OP gave quite a good sitrep, made it clear that he was not in
danger of losing critical data, and was given good, clear and above all
harmless advice: Take a copy of the RAID, recover data from the copy.

Had the OP said that this was critical data I would have offered quite
different advice: Turn the thing off and take it to a professional data
recovery service.

Regards, K.

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Karl Auer ([hidden email])
http://www.biplane.com.au/kauer
http://twitter.com/kauer389

GPG fingerprint: A52E F6B9 708B 51C4 85E6 1634 0571 ADF9 3C1C 6A3A
Old fingerprint: E00D 64ED 9C6A 8605 21E0 0ED0 EE64 2BEE CBCB C38B



--
ubuntu-users mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
Xen
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Help, my disk array has one dead member

Xen
In reply to this post by Bruce Ferrell
Bruce Ferrell schreef op 23-03-2017 0:12:

> On 3/22/17 3:37 PM, Xen wrote:
>> Bruce Ferrell schreef op 22-03-2017 22:36:
>>
>>> Simple raid0 is destroyed with a member drive or partition out so no
>>> copy is needed/possible.
>>
>> He said that one of the drives was damaged, but still readable. The
>> other was fine.
>>
> Yes and he also said it's an md raid.  As I said, the devil is in the
> details;  If it's raid1 (or higher) and one of the elements is
> damaged, the others cover for that with speed degradation.

He said it was raid0.

> raid0, do NOT use any dd variant on the physical disk.  md has
> internal data structures on the physical disks and doesn't transplant
> well to new drives.

He didn't try to do that because there's no point in that if it has a
stripe, right.

> You *might* be able to get away with dd on the md device, but that is
> active and managed by the md subsystem.  I was taught NEVER use dd on
> active devices... Unless you want trash data.

I don't see a reason against it. The block volume is managed from the
outside by the filesystem. The filesystem manages the logical block
space. It has no interference with the md subsystem.

In place of the filesystem, you can also use dd on the block device. It
makes no difference.

You also can't trash data on any device just by reading, usually.

> Use tar and you may have to be selective about what you copy but then
> again maybe sector relocation can help you get past the damage for
> tar.

Maybe, but filesystem recovery is a step after you secure the data, I
think.

> Where this get's REALLY dicey; damaged data may have been mirrored
> back to the good drive from the bad one.  NASTY!

It was a raid0. He said it was a stripe.

> Bottom line, one size never fits all...  poke, prod (gently) and use
> trouble shooting steps to make a determination of what's needed to
> recover and NEVER blindly follow "just do this..." instructions

It wasn't a blanket just do it instruction. It was geared to his
particular use case.

Anyway, sorry for responding again still.

--
ubuntu-users mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Help, my disk array has one dead member

Kevin O'Gorman
On Wed, Mar 22, 2017 at 7:17 PM, Xen <[hidden email]> wrote:
Bruce Ferrell schreef op 23-03-2017 0:12:
On 3/22/17 3:37 PM, Xen wrote:
Bruce Ferrell schreef op <a href="tel:22-03-2017%2022" value="+12203201722" target="_blank">22-03-2017 22:36:

Simple raid0 is destroyed with a member drive or partition out so no
copy is needed/possible.

He said that one of the drives was damaged, but still readable. The other was fine.

Yes and he also said it's an md raid.  As I said, the devil is in the
details;  If it's raid1 (or higher) and one of the elements is
damaged, the others cover for that with speed degradation.

He said it was raid0.

raid0, do NOT use any dd variant on the physical disk.  md has
internal data structures on the physical disks and doesn't transplant
well to new drives.

He didn't try to do that because there's no point in that if it has a stripe, right.

You *might* be able to get away with dd on the md device, but that is
active and managed by the md subsystem.  I was taught NEVER use dd on
active devices... Unless you want trash data.

I don't see a reason against it. The block volume is managed from the outside by the filesystem. The filesystem manages the logical block space. It has no interference with the md subsystem.

In place of the filesystem, you can also use dd on the block device. It makes no difference.

You also can't trash data on any device just by reading, usually.

Use tar and you may have to be selective about what you copy but then
again maybe sector relocation can help you get past the damage for
tar.

Maybe, but filesystem recovery is a step after you secure the data, I think.

Where this get's REALLY dicey; damaged data may have been mirrored
back to the good drive from the bad one.  NASTY!

It was a raid0. He said it was a stripe.

Bottom line, one size never fits all...  poke, prod (gently) and use
trouble shooting steps to make a determination of what's needed to
recover and NEVER blindly follow "just do this..." instructions

It wasn't a blanket just do it instruction. It was geared to his particular use case.

Anyway, sorry for responding again still.


--
ubuntu-users mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users


I pretty much agree with Xen.  It's just that because the root directory on the raid, and some of the subdirectories as well are still there, and many of the files in them are so small, it happens that  a lot of these small files are still readable, because they don't happen to occupy the broken part at all.  (BTW: I don't know if the whole drive is gone, and I kind of doubt it because blkid reports all three.  Also BTW, mdadm on seeing the damage remounts the filesystem read-only, so no there is no activity on it.)

Unfortunately, I have no place to direct a copy of the entire RAID, so ddrescue is not an option.  I did get the three drives that I'm going to put in the other system, but my machine cannot support all 6 and a boot drive at the same time.  All drive bays are full; all SATA ports are in use.

--
Kevin O'Gorman
#define QUESTION ((bb) || (!bb))   /* Shakespeare */

Please consider the environment before printing this email.


--
ubuntu-users mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Help, my disk array has one dead member

Bruce Ferrell
In reply to this post by Xen
On 03/22/2017 07:17 PM, Xen wrote:

> Bruce Ferrell schreef op 23-03-2017 0:12:
>> On 3/22/17 3:37 PM, Xen wrote:
>>> Bruce Ferrell schreef op 22-03-2017 22:36:
>>>
>>>> Simple raid0 is destroyed with a member drive or partition out so no
>>>> copy is needed/possible.
>>>
>>> He said that one of the drives was damaged, but still readable. The other was fine.
>>>
>> Yes and he also said it's an md raid.  As I said, the devil is in the
>> details;  If it's raid1 (or higher) and one of the elements is
>> damaged, the others cover for that with speed degradation.
>
> He said it was raid0.
>
>> raid0, do NOT use any dd variant on the physical disk.  md has
>> internal data structures on the physical disks and doesn't transplant
>> well to new drives.
>
> He didn't try to do that because there's no point in that if it has a stripe, right.
>
>> You *might* be able to get away with dd on the md device, but that is
>> active and managed by the md subsystem.  I was taught NEVER use dd on
>> active devices... Unless you want trash data.
>
> I don't see a reason against it. The block volume is managed from the outside by the filesystem. The filesystem manages the logical block space. It has no interference with the
> md subsystem.
>
> In place of the filesystem, you can also use dd on the block device. It makes no difference.
>
> You also can't trash data on any device just by reading, usually.
>
>> Use tar and you may have to be selective about what you copy but then
>> again maybe sector relocation can help you get past the damage for
>> tar.
>
> Maybe, but filesystem recovery is a step after you secure the data, I think.
>
>> Where this get's REALLY dicey; damaged data may have been mirrored
>> back to the good drive from the bad one.  NASTY!
>
> It was a raid0. He said it was a stripe.
>
>> Bottom line, one size never fits all... poke, prod (gently) and use
>> trouble shooting steps to make a determination of what's needed to
>> recover and NEVER blindly follow "just do this..." instructions
>
> It wasn't a blanket just do it instruction. It was geared to his particular use case.
>
> Anyway, sorry for responding again still.
>
the data is NOT secure if you can't restore it ( made a backup to read only media.  why is that bad? ).  it doesn't matter if successfully read every block and can't put it back
and have it understandable.

I understood it was a stripe.  both raid1 and raid0 are striped. As I said, the devil is in the details.

Which is why I watched the back and forth half the day before responding to any of it.  which is why I gave him steps and theory of operation and pointers to methods so he could be
enabled to make his own decisions as to next steps.

This stuff is what I DO all day long and have for over 30 years.



--
ubuntu-users mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Help, my disk array has one dead member

Bruce Ferrell
In reply to this post by Kevin O'Gorman
On 03/22/2017 07:41 PM, Kevin O'Gorman wrote:

> On Wed, Mar 22, 2017 at 7:17 PM, Xen <[hidden email] <mailto:[hidden email]>> wrote:
>
>     Bruce Ferrell schreef op 23-03-2017 0:12:
>
>         On 3/22/17 3:37 PM, Xen wrote:
>
>             Bruce Ferrell schreef op 22-03-2017 22 <tel:22-03-2017%2022>:36:
>
>                 Simple raid0 is destroyed with a member drive or partition out so no
>                 copy is needed/possible.
>
>
>             He said that one of the drives was damaged, but still readable. The other was fine.
>
>         Yes and he also said it's an md raid.  As I said, the devil is in the
>         details;  If it's raid1 (or higher) and one of the elements is
>         damaged, the others cover for that with speed degradation.
>
>
>     He said it was raid0.
>
>         raid0, do NOT use any dd variant on the physical disk.  md has
>         internal data structures on the physical disks and doesn't transplant
>         well to new drives.
>
>
>     He didn't try to do that because there's no point in that if it has a stripe, right.
>
>         You *might* be able to get away with dd on the md device, but that is
>         active and managed by the md subsystem.  I was taught NEVER use dd on
>         active devices... Unless you want trash data.
>
>
>     I don't see a reason against it. The block volume is managed from the outside by the filesystem. The filesystem manages the logical block space. It has no interference with
>     the md subsystem.
>
>     In place of the filesystem, you can also use dd on the block device. It makes no difference.
>
>     You also can't trash data on any device just by reading, usually.
>
>         Use tar and you may have to be selective about what you copy but then
>         again maybe sector relocation can help you get past the damage for
>         tar.
>
>
>     Maybe, but filesystem recovery is a step after you secure the data, I think.
>
>         Where this get's REALLY dicey; damaged data may have been mirrored
>         back to the good drive from the bad one.  NASTY!
>
>
>     It was a raid0. He said it was a stripe.
>
>         Bottom line, one size never fits all...  poke, prod (gently) and use
>         trouble shooting steps to make a determination of what's needed to
>         recover and NEVER blindly follow "just do this..." instructions
>
>
>     It wasn't a blanket just do it instruction. It was geared to his particular use case.
>
>     Anyway, sorry for responding again still.
>
>
>     --
>     ubuntu-users mailing list
>     [hidden email] <mailto:[hidden email]>
>     Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users <https://lists.ubuntu.com/mailman/listinfo/ubuntu-users>
>
>
>
> I pretty much agree with Xen.  It's just that because the root directory on the raid, and some of the subdirectories as well are still there, and many of the files in them are so
> small, it happens that  a lot of these small files are still readable, because they don't happen to occupy the broken part at all.  (BTW: I don't know if the whole drive is gone,
> and I kind of doubt it because blkid reports all three.  Also BTW, mdadm on seeing the damage remounts the filesystem read-only, so no there is no activity on it.)
>
> Unfortunately, I have no place to direct a copy of the entire RAID, so ddrescue is not an option.  I did get the three drives that I'm going to put in the other system, but my
> machine cannot support all 6 and a boot drive at the same time.  All drive bays are full; all SATA ports are in use.
>
> --
> Kevin O'Gorman
> #define QUESTION ((bb) || (!bb))   /* Shakespeare */
>
> Please consider the environment before printing this email.
>
>
>
>
I've done the dd thing on simple disks; the wisdom is it's ok between same type disks (make/model/etc).  The wisdom of raid is NO, full stop.  Not without serious lab facilities
(recovery houses).  your description doesn't seem to warrant that last and you stated as much.

If mdadm mounts it, the md daemon is talking to it...  The filesystems are inactive because it's read only, but the physical device is active and capable of change in the midst of
a possible dd.  tar, rsync etc are safe on the filesystem because it's ro but the physical volume is rw.

I've had to deal with exactly your situation Kevin.  I pulled off the stuff I needed in chunks in tar files to hop over the bad stuff.  No matter what you choose to do, it's not
painless or fast.

I was just trying to save you some time down a rat hole that I've been down.

I'm out



--
ubuntu-users mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Help, my disk array has one dead member

Karl Auer
In reply to this post by Kevin O'Gorman
On Wed, 2017-03-22 at 19:41 -0700, Kevin O'Gorman wrote:
> Unfortunately, I have no place to direct a copy of the entire RAID,
> so ddrescue is not an option.  I did get the three drives that I'm
> going to put in the other system

Then you DO have somewhere to put the data :-)

You could set up your new system and send the dd over the network to
it. Easy peasy.

Or you could get a USB enclosure, stick one of the new drives in it,
boot the dud system to LiveCD and do the dd to the USB-connected drive.
You can set your new system up with only two drives in the RAID to
start with and add the third when you have extracted the files you want
off it.

There are other variations on those themes.

If you are REALLY stuck, you could take a USB stick and use ddrescue in chunks - grab the first N blocks onto the USB stick, copy them to your new system, go back and get the next N blocks, copy to your new system, then get the next N blocks and so on until it's all been copied. Could take a while if the RAID is big and the USB stick is small. Definitely suggest writing down your start and end blocks, it wouldn't do to lose track :-)

Regards, K.


--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Karl Auer ([hidden email])
http://www.biplane.com.au/kauer
http://twitter.com/kauer389

GPG fingerprint: A52E F6B9 708B 51C4 85E6 1634 0571 ADF9 3C1C 6A3A
Old fingerprint: E00D 64ED 9C6A 8605 21E0 0ED0 EE64 2BEE CBCB C38B



--
ubuntu-users mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Help, my disk array has one dead member

Karl Auer
In reply to this post by Bruce Ferrell
On Wed, 2017-03-22 at 19:57 -0700, Bruce Ferrell wrote:
> I understood it was a stripe.  both raid1 and raid0 are striped.

Ah no, they aren't. RAID0 is striped, RAID1 is not.

> which is why I gave him steps and theory of
> operation and pointers to methods so he could be 
> enabled to make his own decisions as to next steps.

Which is great, but you expressed it as if everybody else was wrong or
had missed the point, so people got annoyed instead of listening to
your good info.

Not for you perhaps, but for others, this is a great intro to RAID
variants:

http://www.thegeekstuff.com/2010/08/raid-levels-tutorial

Regards, K.

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Karl Auer ([hidden email])
http://www.biplane.com.au/kauer
http://twitter.com/kauer389

GPG fingerprint: A52E F6B9 708B 51C4 85E6 1634 0571 ADF9 3C1C 6A3A
Old fingerprint: E00D 64ED 9C6A 8605 21E0 0ED0 EE64 2BEE CBCB C38B



--
ubuntu-users mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Help, my disk array has one dead member

Karl Auer
In reply to this post by Bruce Ferrell
On Wed, 2017-03-22 at 20:11 -0700, Bruce Ferrell wrote:
> I've done the dd thing on simple disks; the wisdom is it's ok between
> same type disks (make/model/etc).  The wisdom of raid is NO, full
> stop.

Ummmm - you may have missed the rest of the "dd thing". No-one is
suggesting copying the dd image back onto a disk of any description.

What was suggested was reading the virtual disk (not any physical disk)
block for block into a file on a different device. The result is a disk
image. That image can then be mounted on a loopback device and
inspected as if it were a real disk; files and directories can be
copied off it and the data recovered in that way. Data that was DOA is
still DOA, but as the RAID is still basically readable, dd will get you
all that can be got (barring the use of, as you rightly say, "serious
lab facilities").

> filesystems are inactive because it's read only, but the physical
> device is active and capable of change in the midst of 
> a possible dd.  tar, rsync etc are safe on the filesystem because
> it's ro but the physical volume is rw.

Yes, that's an issue. Hence my suggestion that the RAID should be
accessed read-only and from a LiveCD. While the RAID may still be
moving stuff around at the physical level (unlikely since it has
faulted itself into read-only mode anyway) the virtual disk - the thing
the OS and thus dd sees - will not be changing and can safely be read.

Happy to be corrected if my understanding here is wrong.

Regards, K.

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Karl Auer ([hidden email])
http://www.biplane.com.au/kauer
http://twitter.com/kauer389

GPG fingerprint: A52E F6B9 708B 51C4 85E6 1634 0571 ADF9 3C1C 6A3A
Old fingerprint: E00D 64ED 9C6A 8605 21E0 0ED0 EE64 2BEE CBCB C38B



--
ubuntu-users mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Help, my disk array has one dead member

Kevin O'Gorman
In reply to this post by Karl Auer
Hmmm.  I thought I had sent this, but just discovered it was still a draft.  Sorry for the time warp.


On Wed, Mar 22, 2017 at 4:43 PM, Karl Auer <[hidden email]> wrote:
On Wed, 2017-03-22 at 16:12 -0700, Bruce Ferrell wrote:
> Bottom line, one size never fits all...  poke, prod (gently) and use 
> trouble shooting steps to make a determination of what's needed to 
> recover and NEVER blindly follow "just do this..." instructions

Sometimes people have no choice.

Not sure what you are on about.

The OP gave quite a good sitrep, made it clear that he was not in
danger of losing critical data, and was given good, clear and above all
harmless advice: Take a copy of the RAID, recover data from the copy.

Had the OP said that this was critical data I would have offered quite
different advice: Turn the thing off and take it to a professional data
recovery service.

Regards, K.



The OP checks in (Wednesday is my busiest day).

First off, this is raid-0, and comprises three 4-TB drives.  It's just stripes so I have a larger partition than any one of my drives.  No redundancy.  Sounds horrible, but remember this is hobby data only.  Not only that, but there are enough backups of critical bits for me to feel okay about re-creating anything lost.  And I have physical limitations in my machines that keep me from adding more drives, which I might otherwise do to have some redundancy.

Thing is, I want some specifics about what little bits were lost.  Just filenames will do.  This is mostly lots and lots of little bitty files.  The few big ones are the ones I back up a lot.

I may already have enough, or nearly so.  I have copied all the files I could in the root directory of the raid, and captured names of the ones I could not read.  Of course there are some directories I could not read. and anything in them is lost too.  But there are some directories that are readable.  So I'd like to do the same thing in those.  There are enough of them that I'd like to automate it, including taking note of what was unreadable.  Filenames themselves will tell me a lot.

I'm not going to image the broken RAID, because I'm just not going to spend the time dealing with tiny fragments.  I'm going to recover what complete files I can, though I'm not sure any data files will be useful, because some scripts may be recovered and it's my hobby and I'll feel better about it if I go that far.  I have a working tar file of the main sqlite database as of 3 days ago and the only other stuff that really matters is the little scripts I use to automate some of the chores.  I don't want fragments that I'm not sure were current -- I'd rather rewrite from memory or from scratch.  Then I have a day or so of work to redo, and I have a log book that tells me what that was.  I should do what recovery I can, use the new disks to replace that raid entirely, and order a new set.  When I'm back up and running I'll test the bejabbers out of the raid's drives before I put any back in service.

And background: I've lost data before, like my PhD research back around 1999.  Eeek!.  So twice I've gone to data recovery services and paid a few thousand bucks to get data back.  Because of this, nowadays I stay pretty current with backups of crucial stuff.  But my raid is so big I just have to take my chances.  I didn't have a great destination for storing backups.  This is changing, and I now have another big machine, on which I'm installing a RAID that will mirror the things on this one.  But I'm going to use the drives I got for that as the replacement RAID in the old machine. 


--
Kevin O'Gorman
#define QUESTION ((bb) || (!bb))   /* Shakespeare */

Please consider the environment before printing this email.


--
ubuntu-users mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Help, my disk array has one dead member

Karl Auer
On Wed, 2017-03-22 at 22:14 -0700, Kevin O'Gorman wrote:
> And background: I've lost data before, like my PhD research back
> around 1999.  Eeek!

In 1993 (?) I was doing PC support at the the Australian National
University. Saw three or four lost souls traipse in with a few floppies
or a hard disk containing the only copy of their thesis or their PhD
data, asking plaintively if we could help. We rarely could, and there
was no way they could pay the big bucks that were needed for
professional data recovery. Sometimes that had printouts that were only
a few weeks or months old...

Easy to laugh, but for those people the loss was a palpable tragedy,
and one of the main reasons I am completely anal about backups - mine
and my customers'.

And I am still angry when I think that their supervisors, or their
Faculties, didn't tell them how to back up, and just let them lose
literally years of work.

It is *amazing* how many people have literally never considered the
idea that they might one day just lose all their data. How can anyone
in this day and age not have thought about their laptop, tablet, phone
or desktop being lost or stolen, or just failing one day?

Regards, K.

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Karl Auer ([hidden email])
http://www.biplane.com.au/kauer
http://twitter.com/kauer389

GPG fingerprint: A52E F6B9 708B 51C4 85E6 1634 0571 ADF9 3C1C 6A3A
Old fingerprint: E00D 64ED 9C6A 8605 21E0 0ED0 EE64 2BEE CBCB C38B



--
ubuntu-users mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
1234
Loading...