17 votes

[SOLVED] Recovering data in a very old, possibly corrupted tar archive?

Hi all,

There is a tar.gz whose contents I would like to access. The file itself is quite old, last being updated ~20 years ago if I had to guess (I am not sure if this is relevant). The file contains legacy scientific code that I would like for archival purposes and can be found via the "code site" link here. However:

  • When I download the file and run tar -xvzf radpack.tar.gz I get an error: tar: Error opening archive: Unrecognized archive format.
  • Likewise, if I try to gunzip it I get gunzip: radpack.tar.gz: not in gzip format.
  • Running file radpack.tar.gz only yields radpack.tar.gx: data, indicating file cannot find anything out about it.
  • head radpack.tar.gz outputs a string of unintelligible unicode.

These are the different solutions I have come across after searching this problem, and to me is good evidence that the file has been corrupted in some way and that may very well be the case. However, for archival and historical purposes it would be great if I could access the contents, so I am compelled to search out other solutions. Are there other options I can try here? Is there some way to confirm that the file is in fact corrupted beyond recovery? Any help on this point would be greatly appreciated. I posted this on stack exchange as well but figured maybe the smart folks here might know.

EDIT:

Just to be clear this has been solved — a functioning copy of the archive was found.

10 comments

  1. [6]
    brandt
    Link
    I think this might be the same package, but working: https://web.archive.org/web/20010615115135/http://bubba.ucdavis.edu/~knox/radpack/radpack.tar.gz Note that this one is only a tar file despite...
    • Exemplary

    I think this might be the same package, but working: https://web.archive.org/web/20010615115135/http://bubba.ucdavis.edu/~knox/radpack/radpack.tar.gz

    Note that this one is only a tar file despite the .gz extension.

    22 votes
    1. [5]
      gpl
      Link Parent
      Amazing! I can't thank you enough. I had tried using the WayBack machine on the site that I linked in my post to see if some older cache somehow had a link pointed towards a non-corrupted file,...

      Amazing! I can't thank you enough. I had tried using the WayBack machine on the site that I linked in my post to see if some older cache somehow had a link pointed towards a non-corrupted file, but I had not come across the site you did here. Thank goodness for the Internet Archive, and thank goodness for @brandt.

      11 votes
      1. brandt
        Link Parent
        Glad I could help. It was a fun rabbit hole to go down. As you saw, the file magic header was 0x1F 0xEF instead of what you'd expect from gzip (1F 8B). But there were a bunch of 0x1f archive...

        Glad I could help.

        It was a fun rabbit hole to go down. As you saw, the file magic header was 0x1F 0xEF instead of what you'd expect from gzip (1F 8B). But there were a bunch of 0x1f archive formats. So the search took me on a tour of the many ancient archive formats that might have still been kicking around back in 1999–2001 when this file was created. I didn't find any that matched, but it was a fun search anyhow.

        Incidentally, I've concluded that the file really is just corrupt. The closest I could come to finding something with that signature was when you read a gzipped archive as UTF-8 text. In UTF-8, the 1F is a valid character but 8B is not. So the latter gets substituted with replacement character EF BF BD making the first few bytes 1F EF BF BD. However, in our bad file the third byte is 08, which is exactly what we'd expect from a regular gzip file. The rest of the header also looks normal. In the bad file's trailer, the uncompressed size matches our good file, but the uncompressed CRC32 checksum does not.

        So I don't have a good explanation for how it became corrupt, but given the nature of the contents, I'm going with cosmic rays. :)

        19 votes
      2. [2]
        Pistos
        Link Parent
        If your problem is solved, perhaps you could update your post to indicate this.

        If your problem is solved, perhaps you could update your post to indicate this.

        4 votes
      3. Pavouk106
        Link Parent
        Not pressuring into anything at all, but maybe consider donation to Internet archive if it helped you in such situation...? I still haven't donated anything to them myself, I should as it will...

        Not pressuring into anything at all, but maybe consider donation to Internet archive if it helped you in such situation...?

        I still haven't donated anything to them myself, I should as it will surely save my ass one day.

        3 votes
  2. [2]
    tesseractcat
    Link
    Depending on how much time you want to dedicate to the task one strategy would be to open up the file in a hex editor and compare it to the GZIP specification to see how exactly it's corrupted....

    Depending on how much time you want to dedicate to the task one strategy would be to open up the file in a hex editor and compare it to the GZIP specification to see how exactly it's corrupted. Depending on the type of corruption it might be possible to fix it, or to extract some of the compressed data and decompress it separately.

    5 votes
    1. gpl
      (edited )
      Link Parent
      Good tip! Doing this, I discovered that the gzip header was corrupted, which I edited back to the correct 1f 8b. It is now recognized as a gzip file, and I can get some information about it...

      Good tip! Doing this, I discovered that the gzip header was corrupted, which I edited back to the correct 1f 8b. It is now recognized as a gzip file, and I can get some information about it (including that it was last edited Fri Jul 16 21:32:28 1999, which I find believable). I am way out of my element but I am going to keep digging here. Thanks.

      10 votes
  3. [2]
    Pistos
    Link
    If the contents are not sensitive, you could consider providing the file, so others can tinker with it and try to crack it open for you. In addition to file, consider exiftool. It might be...

    If the contents are not sensitive, you could consider providing the file, so others can tinker with it and try to crack it open for you.

    In addition to file, consider exiftool.

    It might be fruitful to get directly in touch with the developers and other very-interested people, by way of their mailing list: https://www.gnu.org/software/gzip/ They would know details such as the compression algorithm, file format, etc. and if it might be possible to recover data if the original gz file just had a few bytes lopped off.

    3 votes
    1. gpl
      Link Parent
      Whoops! I meant to include a link. I have edited the OP to point towards a download link for anyone interested. I have gotten the computer to recognize it as a gzip file, but unfortunately a...

      Whoops! I meant to include a link. I have edited the OP to point towards a download link for anyone interested. I have gotten the computer to recognize it as a gzip file, but unfortunately a simple decompression now fails due to a data stream error. Not promising, but I have nothing else to do tonight so I keep trying!

      For those interested, this file contains some historically interesting cosmic microwave background data that as far as I can tell can no longer be found on the web.

      6 votes