Hard drive dying, trying to save a VM
I have a large VirtualBox VM on an external HDD. The HDD fails the S.M.A.R.T. test. The VM still works fine, but any regular attempt to copy the VM files over to a healthy drive fails ... there is clearly already something corrupt in the VM's virtual HDD, although it is not (apparently? yet?) affecting the functionality of the actual VM.
Any suggestions on how to save the VM? Linux Mint Guest OS, Pop_OS (Ubuntu) Host. The VM is nearly 800 GB. Both regular copy and rsync fail.
Thanks,
Eric
PS: (and perhaps I should have led with this, but...) is it okay to ask these kinds of specific, technical, "help me with my tech-stuff" questions here on Tildes?
Update to the update ... moved update info into a comment ... will keep my progress updated in that primary comment.
Danke, y gracias to all
The urgent first step is to clone your failing drive to a good one. S.M.A.R.T. detectable failures on mechanical drives don't get better on their own; there's a good chance your drive will fall completely.
Cloning software can also perform multiple read attempts on failing sectors, potentially recovering some data.
Clonezilla is FOSS and fairly easy to use. The most critical task is verifying that you've identified your drives correctly before you start the clone. I haven't copied a blank new drive over the data myself, but I watched someone almost do that before I screamed at him.
The VM data on your cloned drive may be readable where the old drive wasn't.
If it isn't, and your missing data is important to you, but not so important that you'd pay thousands for professional data recovery, then get yourself another clean drive, and a copy of SpinRite. I've successfully recovered drives from punctured RAID arrays with that tool, and it's completely worth the price for technicians. Be very sure that you have a complete clone of the original drive before proceeding. SpinRite performs so many reads that failing drives may never boot again after you use it - treat it as a destructive tool. Essentially, SpinRite makes another clone of your failing drive, but uses some statistical sampling trickery to attempt reconstruction of data from bad sectors. There are more sophisticated (and expensive) professional tools for recovering data from worn drives, but I haven't found anything that provides more utility per dollar when required.
I've also used TestDisk and PhotoRec, but never had much luck if SpinRite failed.
Am former data recovery tech (are you one too, @patience_limited?). That's very solid advice. The only thing I would add is:
If the data in that VM is actually critical and/or valuable, stop fiddling with the drive immediately @Eric_the_Cerise, as by continuing to do so you are risking causing potentially irreparable data loss. Unplug it, pack it up safely/securely, and take it to a professional in that case... preferably one with direct access to a cleanroom, since otherwise they
will likelymay just wind up shipping the drive to another company who does have that facility, but take a cut for themselves first as a middle man.p.s. If you do keep trying to work on it yourself though, here is a bit more advice from some older comments of mine:
I didn't work on data recovery full time, as your important addenda indicate! What I know is what I picked up on an as-needed basis from years of SMB support in Florida, where the combination of environmental problems like poor power quality or overheating, and low availability of reliable technologists, meant plenty of opportunities to recover unbacked-up data. 🙄
Don't even get me started on recovering data from malware damage - back in the oughts and early '10's, that was at least a quarter of my typical day.
Roger that. And neat, I spent some of my formative years living in Florida (Boca, Ft.L and Miami). :)
Oof, yeah. Malware can be a bitch. But for me it's "Don't even get me started on recovering data from broken thumb drives!" :P Those things are a bloody nightmare, especially because of the scale involved, and the fact that there are practically no standards to their design, so every one is completely different. And given how fragile they generally are, the amount of people that store critically important data on them (with no backup) is insane to me! Oh, the horror stories I could tell (e.g. RIP PhD theses). :(
Trying to figure out a convenient way to update everyone who replied, w/o repeating myself 5 times ... so, I'm just updating my original post, as needed, and mentioning people by name, as needed.
Just FYI, mentions don't work in topic text yet (https://gitlab.com/tildes/tildes/issues/195), so including them there won't actually notify those people. You could always just make a new top-level comment for updating and including them though.
Did not know that. Thanks. I replied to everyone, too, so not an issue this time, but still, thanks for the tip.
Based on @Eric_the_Cerise's details, I just realized that there's an addendum-to-the-addendum here.
Don't try to boot the virtual machine again or mount its drives until you've completed the clone, even if the virtual machine metadata is stored on a different drive. Windows systems do an enormous amount of read/write housekeeping on boot and shutdown, as does NTFS. As @cfabbro said, you want zero activity on the failing drive until you can copy it in toto.
SpinRite is pretty industry standard for a reason. As @patience_limited mentioned, it's considered a destructive (aka "last resort") tool because of how many read cycles it does, but it can (and often does) work where other imaging/recovery software fails. I have personally had it recover data on multiple occasions when, even after a platter and/or PCB swap, every other piece of software failed, even the device manufacturer specific software. Now with that said, for 99% of cases SpinRite is overkill IMO, and something like ddrescue will work just fine. But that is not true in every case.
p.s. If I had to guess why this person is slagging off SpinRite, it's probably just because they haven't actually worked on enough drives to encounter that 1% case where everything else they tried fails, and then SpinRite comes to the rescue.
That's a bit strong, there are plenty of people in the industry who think SpinRite is bollocks.
There are more situations where SpinRite will cause harm than help, and anyone recommending spinrite needs to be much clearer about the fact that spinrite is something that will destroy the drive, and may well destroy the data, but if you've tried everything else, and cannot afford professional recovery, you may as well try it.
https://serverfault.com/questions/51681/does-spinrite-do-what-it-claims-to-do
I explicitly called it a "destructive (aka "last resort") tool" ... how much clearer can I make it? And patience_limited did the same in their comment.
I have used SpinRite in a professional capacity on many occasions, and I know many of my former colleagues have too. Again though, I only ever used as a last resort, after other less destructive imaging/recovery software failed to produce results, or the images that were produced were still missing critical data that a client desperately needed. But it was never used first, and anyone who does that (like the top serverfault commenter) is making a serious mistake IMO. And yes, sometimes SpinRite will take a month (or longer) to finally produce a workable result, but in my experience with using it, the bottom line is that it often does eventually work in a lot of cases when no other software does.
Trying to figure out a convenient way to update everyone who replied, w/o repeating myself 5 times ... so, I'm just updating my original post, as needed, and mentioning people by name, as needed.
Danke. Community here is small, I know, but also seems very text-oriented, and looks like a good guess. I've already received much excellent, detailed advice on the issue.
OP here ...
To clarify, none of this is mission-critical, but it would save me a lot of inconvenience. I have a physical PC, used to be my primary home machine. I virtualized the entire thing into VirtualBox. I still have the working physical machine ... and if necessary, I can virtualize it again ... but man, that was a PITA; sure would like to avoid doing that again
To @ali, yes, I think I could just pull the important data out of the VM while it's running, but I can do that with the physical machine, too. The point is, I want a portable/virtual copy of my old PC, after which I can get rid of the physical machine.
Per @patience_limited and @cfabbro, I will try Clonezilla-ing the failing HDD tomorrow, and report back afterwards.
Thanks, all.
I wish you the best of fortune with this - please feel free to reach out via PM if you have questions or difficulties. I can't promise prompt response, but I'm hoping for your success, especially in the face of how annoying it is to fight with virtualization of desktop hardware. I've felt your pain!
Any updates? Did you manage to get your data off the drive and out of the VM?
Project's on hold for a few days. It's just so big ... I'm having trouble making room for juggling the files and images. Waiting on a new HDD for now.
Heh, yeah one of the hardest parts of doing data recovery at home is often just finding space for the image and recovered data. :P Once you get your new HDD, feel free to PM me if you have any questions or need any assistance.
Danke, y gracias.
Once I get the VM stable and working on a reliable HDD, the next task is to convert it to a dynamic virtual drive and shrink it ... probably 50% or more of it is empty space.
Did you try backing up the important files out of the vm onto another hard drive?
Trying to figure out a convenient way to update everyone who replied, w/o repeating myself 5 times ... so, I'm just updating my original post, as needed, and mentioning people by name, as needed.
I've used ddrescue before, with great success. Even if it can't read the entire VM image, it'll copy what it can and skip over the parts that it can't.
After that copy, it's possible the VM image will be damaged enough that it won't boot. You may still be able to boot the VM into a Linux liveCD, mount the disk image read-only, then copy files off through the VM and onto more reliable storage.