14 votes

Hard drive dying, trying to save a VM

I have a large VirtualBox VM on an external HDD. The HDD fails the S.M.A.R.T. test. The VM still works fine, but any regular attempt to copy the VM files over to a healthy drive fails ... there is clearly already something corrupt in the VM's virtual HDD, although it is not (apparently? yet?) affecting the functionality of the actual VM.

Any suggestions on how to save the VM? Linux Mint Guest OS, Pop_OS (Ubuntu) Host. The VM is nearly 800 GB. Both regular copy and rsync fail.

Thanks,
Eric

PS: (and perhaps I should have led with this, but...) is it okay to ask these kinds of specific, technical, "help me with my tech-stuff" questions here on Tildes?


Update to the update ... moved update info into a comment ... will keep my progress updated in that primary comment.

Danke, y gracias to all

25 comments

  1. [14]
    patience_limited
    (edited )
    Link
    The urgent first step is to clone your failing drive to a good one. S.M.A.R.T. detectable failures on mechanical drives don't get better on their own; there's a good chance your drive will fall...

    The urgent first step is to clone your failing drive to a good one. S.M.A.R.T. detectable failures on mechanical drives don't get better on their own; there's a good chance your drive will fall completely.

    Cloning software can also perform multiple read attempts on failing sectors, potentially recovering some data.

    Clonezilla is FOSS and fairly easy to use. The most critical task is verifying that you've identified your drives correctly before you start the clone. I haven't copied a blank new drive over the data myself, but I watched someone almost do that before I screamed at him.

    The VM data on your cloned drive may be readable where the old drive wasn't.

    If it isn't, and your missing data is important to you, but not so important that you'd pay thousands for professional data recovery, then get yourself another clean drive, and a copy of SpinRite. I've successfully recovered drives from punctured RAID arrays with that tool, and it's completely worth the price for technicians. Be very sure that you have a complete clone of the original drive before proceeding. SpinRite performs so many reads that failing drives may never boot again after you use it - treat it as a destructive tool. Essentially, SpinRite makes another clone of your failing drive, but uses some statistical sampling trickery to attempt reconstruction of data from bad sectors. There are more sophisticated (and expensive) professional tools for recovering data from worn drives, but I haven't found anything that provides more utility per dollar when required.

    I've also used TestDisk and PhotoRec, but never had much luck if SpinRite failed.

    18 votes
    1. [7]
      cfabbro
      (edited )
      Link Parent
      Am former data recovery tech (are you one too, @patience_limited?). That's very solid advice. The only thing I would add is: If the data in that VM is actually critical and/or valuable, stop...

      Am former data recovery tech (are you one too, @patience_limited?). That's very solid advice. The only thing I would add is:

      If the data in that VM is actually critical and/or valuable, stop fiddling with the drive immediately @Eric_the_Cerise, as by continuing to do so you are risking causing potentially irreparable data loss. Unplug it, pack it up safely/securely, and take it to a professional in that case... preferably one with direct access to a cleanroom, since otherwise they will likely may just wind up shipping the drive to another company who does have that facility, but take a cut for themselves first as a middle man.

      p.s. If you do keep trying to work on it yourself though, here is a bit more advice from some older comments of mine:

      Rule #1 of Data Recovery; Never ever ever write to the drive you are trying to recover data from. This is why Data Recovery companies generally always attempt to image a drive first before doing anything else to it (unless there is risk of exacerbating physical damage to the drive by doing so), then continue their work using that image rather than the old drive, and finally only ever recover the data to another totally new drive.

      I realize that [imaging a drive] is not always possible to do for home users, and unless the data is worth more than the cost of the additional drives it would take to store the images, it isn't worth it anyways. But it is the only really surefire way to make sure you don't lose any additional data during a recovery.

      7 votes
      1. [2]
        patience_limited
        (edited )
        Link Parent
        I didn't work on data recovery full time, as your important addenda indicate! What I know is what I picked up on an as-needed basis from years of SMB support in Florida, where the combination of...

        I didn't work on data recovery full time, as your important addenda indicate! What I know is what I picked up on an as-needed basis from years of SMB support in Florida, where the combination of environmental problems like poor power quality or overheating, and low availability of reliable technologists, meant plenty of opportunities to recover unbacked-up data. 🙄

        Don't even get me started on recovering data from malware damage - back in the oughts and early '10's, that was at least a quarter of my typical day.

        3 votes
        1. cfabbro
          (edited )
          Link Parent
          Roger that. And neat, I spent some of my formative years living in Florida (Boca, Ft.L and Miami). :) Oof, yeah. Malware can be a bitch. But for me it's "Don't even get me started on recovering...

          Roger that. And neat, I spent some of my formative years living in Florida (Boca, Ft.L and Miami). :)

          Don't even get me started on recovering data from malware damage

          Oof, yeah. Malware can be a bitch. But for me it's "Don't even get me started on recovering data from broken thumb drives!" :P Those things are a bloody nightmare, especially because of the scale involved, and the fact that there are practically no standards to their design, so every one is completely different. And given how fragile they generally are, the amount of people that store critically important data on them (with no backup) is insane to me! Oh, the horror stories I could tell (e.g. RIP PhD theses). :(

          5 votes
      2. [3]
        Eric_the_Cerise
        Link Parent
        Trying to figure out a convenient way to update everyone who replied, w/o repeating myself 5 times ... so, I'm just updating my original post, as needed, and mentioning people by name, as needed.

        Trying to figure out a convenient way to update everyone who replied, w/o repeating myself 5 times ... so, I'm just updating my original post, as needed, and mentioning people by name, as needed.

        2 votes
        1. [2]
          cfabbro
          Link Parent
          Just FYI, mentions don't work in topic text yet (https://gitlab.com/tildes/tildes/issues/195), so including them there won't actually notify those people. You could always just make a new...

          Just FYI, mentions don't work in topic text yet (https://gitlab.com/tildes/tildes/issues/195), so including them there won't actually notify those people. You could always just make a new top-level comment for updating and including them though.

          1 vote
          1. Eric_the_Cerise
            Link Parent
            Did not know that. Thanks. I replied to everyone, too, so not an issue this time, but still, thanks for the tip.

            Did not know that. Thanks. I replied to everyone, too, so not an issue this time, but still, thanks for the tip.

            2 votes
      3. patience_limited
        Link Parent
        Based on @Eric_the_Cerise's details, I just realized that there's an addendum-to-the-addendum here. Don't try to boot the virtual machine again or mount its drives until you've completed the...

        Based on @Eric_the_Cerise's details, I just realized that there's an addendum-to-the-addendum here.

        Don't try to boot the virtual machine again or mount its drives until you've completed the clone, even if the virtual machine metadata is stored on a different drive. Windows systems do an enormous amount of read/write housekeeping on boot and shutdown, as does NTFS. As @cfabbro said, you want zero activity on the failing drive until you can copy it in toto.

        2 votes
    2. [5]
      vivaria
      Link Parent
      Saw this comment on HN disparaging SpinRite and providing other suggestions. I'm not sure if it's valuable (I have no experience in data recovery) but I'm curious what you and/or @cfabbro might...

      Saw this comment on HN disparaging SpinRite and providing other suggestions. I'm not sure if it's valuable (I have no experience in data recovery) but I'm curious what you and/or @cfabbro might think of it:

      https://news.ycombinator.com/item?id=21941776

      3 votes
      1. [4]
        cfabbro
        (edited )
        Link Parent
        SpinRite is pretty industry standard for a reason. As @patience_limited mentioned, it's considered a destructive (aka "last resort") tool because of how many read cycles it does, but it can (and...

        SpinRite is pretty industry standard for a reason. As @patience_limited mentioned, it's considered a destructive (aka "last resort") tool because of how many read cycles it does, but it can (and often does) work where other imaging/recovery software fails. I have personally had it recover data on multiple occasions when, even after a platter and/or PCB swap, every other piece of software failed, even the device manufacturer specific software. Now with that said, for 99% of cases SpinRite is overkill IMO, and something like ddrescue will work just fine. But that is not true in every case.

        p.s. If I had to guess why this person is slagging off SpinRite, it's probably just because they haven't actually worked on enough drives to encounter that 1% case where everything else they tried fails, and then SpinRite comes to the rescue.

        3 votes
        1. vivaria
          Link Parent
          Neat, good to know! FWIW this thread got me off my arse and backing things up, even if I wasn't the intended audience. I still don't have an automated approach going forward but at least I have...

          Neat, good to know!

          FWIW this thread got me off my arse and backing things up, even if I wasn't the intended audience. I still don't have an automated approach going forward but at least I have something.

          3 votes
        2. [2]
          DanBC
          Link Parent
          That's a bit strong, there are plenty of people in the industry who think SpinRite is bollocks. There are more situations where SpinRite will cause harm than help, and anyone recommending spinrite...

          SpinRite is pretty industry standard for a reason

          That's a bit strong, there are plenty of people in the industry who think SpinRite is bollocks.

          There are more situations where SpinRite will cause harm than help, and anyone recommending spinrite needs to be much clearer about the fact that spinrite is something that will destroy the drive, and may well destroy the data, but if you've tried everything else, and cannot afford professional recovery, you may as well try it.

          https://serverfault.com/questions/51681/does-spinrite-do-what-it-claims-to-do

          2 votes
          1. cfabbro
            (edited )
            Link Parent
            I explicitly called it a "destructive (aka "last resort") tool" ... how much clearer can I make it? And patience_limited did the same in their comment. I have used SpinRite in a professional...

            I explicitly called it a "destructive (aka "last resort") tool" ... how much clearer can I make it? And patience_limited did the same in their comment.

            but if you've tried everything else, and cannot afford professional recovery

            I have used SpinRite in a professional capacity on many occasions, and I know many of my former colleagues have too. Again though, I only ever used as a last resort, after other less destructive imaging/recovery software failed to produce results, or the images that were produced were still missing critical data that a client desperately needed. But it was never used first, and anyone who does that (like the top serverfault commenter) is making a serious mistake IMO. And yes, sometimes SpinRite will take a month (or longer) to finally produce a workable result, but in my experience with using it, the bottom line is that it often does eventually work in a lot of cases when no other software does.

            4 votes
    3. Eric_the_Cerise
      Link Parent
      Trying to figure out a convenient way to update everyone who replied, w/o repeating myself 5 times ... so, I'm just updating my original post, as needed, and mentioning people by name, as needed.

      Trying to figure out a convenient way to update everyone who replied, w/o repeating myself 5 times ... so, I'm just updating my original post, as needed, and mentioning people by name, as needed.

      1 vote
  2. [2]
    Bauke
    Link
    It's fine and there's been other tech support topics before but it's hard to say whether you'll get anywhere, since Tildes is still relatively small so there might just not be anyone here that can...

    is it okay to ask these kinds of specific, technical, "help me with my tech-stuff" questions here on Tildes?

    It's fine and there's been other tech support topics before but it's hard to say whether you'll get anywhere, since Tildes is still relatively small so there might just not be anyone here that can help. I'd say a better place for this kind of thing would be one of the StackExchange communities or a dedicated data recovery forum, at least there you'll get a more focused audience.

    6 votes
    1. Eric_the_Cerise
      Link Parent
      Danke. Community here is small, I know, but also seems very text-oriented, and looks like a good guess. I've already received much excellent, detailed advice on the issue.

      Danke. Community here is small, I know, but also seems very text-oriented, and looks like a good guess. I've already received much excellent, detailed advice on the issue.

      2 votes
  3. [6]
    Eric_the_Cerise
    Link
    OP here ... To clarify, none of this is mission-critical, but it would save me a lot of inconvenience. I have a physical PC, used to be my primary home machine. I virtualized the entire thing into...

    OP here ...

    To clarify, none of this is mission-critical, but it would save me a lot of inconvenience. I have a physical PC, used to be my primary home machine. I virtualized the entire thing into VirtualBox. I still have the working physical machine ... and if necessary, I can virtualize it again ... but man, that was a PITA; sure would like to avoid doing that again

    To @ali, yes, I think I could just pull the important data out of the VM while it's running, but I can do that with the physical machine, too. The point is, I want a portable/virtual copy of my old PC, after which I can get rid of the physical machine.

    Per @patience_limited and @cfabbro, I will try Clonezilla-ing the failing HDD tomorrow, and report back afterwards.

    Thanks, all.

    5 votes
    1. patience_limited
      Link Parent
      I wish you the best of fortune with this - please feel free to reach out via PM if you have questions or difficulties. I can't promise prompt response, but I'm hoping for your success, especially...

      I wish you the best of fortune with this - please feel free to reach out via PM if you have questions or difficulties. I can't promise prompt response, but I'm hoping for your success, especially in the face of how annoying it is to fight with virtualization of desktop hardware. I've felt your pain!

      3 votes
    2. [4]
      cfabbro
      Link Parent
      Any updates? Did you manage to get your data off the drive and out of the VM?

      Any updates? Did you manage to get your data off the drive and out of the VM?

      1. [3]
        Eric_the_Cerise
        Link Parent
        Project's on hold for a few days. It's just so big ... I'm having trouble making room for juggling the files and images. Waiting on a new HDD for now.

        Project's on hold for a few days. It's just so big ... I'm having trouble making room for juggling the files and images. Waiting on a new HDD for now.

        1 vote
        1. [2]
          cfabbro
          Link Parent
          Heh, yeah one of the hardest parts of doing data recovery at home is often just finding space for the image and recovered data. :P Once you get your new HDD, feel free to PM me if you have any...

          Heh, yeah one of the hardest parts of doing data recovery at home is often just finding space for the image and recovered data. :P Once you get your new HDD, feel free to PM me if you have any questions or need any assistance.

          1. Eric_the_Cerise
            Link Parent
            Danke, y gracias. Once I get the VM stable and working on a reliable HDD, the next task is to convert it to a dynamic virtual drive and shrink it ... probably 50% or more of it is empty space.

            Danke, y gracias.

            Once I get the VM stable and working on a reliable HDD, the next task is to convert it to a dynamic virtual drive and shrink it ... probably 50% or more of it is empty space.

            1 vote
  4. [2]
    ali
    Link
    Did you try backing up the important files out of the vm onto another hard drive?

    Did you try backing up the important files out of the vm onto another hard drive?

    4 votes
    1. Eric_the_Cerise
      Link Parent
      Trying to figure out a convenient way to update everyone who replied, w/o repeating myself 5 times ... so, I'm just updating my original post, as needed, and mentioning people by name, as needed.

      Trying to figure out a convenient way to update everyone who replied, w/o repeating myself 5 times ... so, I'm just updating my original post, as needed, and mentioning people by name, as needed.

      1 vote
  5. spit-evil-olive-tips
    Link
    I've used ddrescue before, with great success. Even if it can't read the entire VM image, it'll copy what it can and skip over the parts that it can't. After that copy, it's possible the VM image...

    I've used ddrescue before, with great success. Even if it can't read the entire VM image, it'll copy what it can and skip over the parts that it can't.

    After that copy, it's possible the VM image will be damaged enough that it won't boot. You may still be able to boot the VM into a Linux liveCD, mount the disk image read-only, then copy files off through the VM and onto more reliable storage.

    3 votes