12 votes

File structure difference between NAS and cloud storage

I have a NAS with a ton of photos and documents that have remained untouched for around 6 years. I uploaded all that stuff to OneDrive. Tidied it up and kept using OneDrive mostly. But I also sent stuff to the NAS. They have diverged.

I'm thinking about ways of restructuring/sorting my NAS to match my OneDrive so that I can then sync the two. I thought about making a python script that would just match on file names and move them to the correct location.

Figured before I did I'd ask if anyone else had any other suggestions

7 comments

  1. [4]
    zatamzzar
    Link
    Once you have everything in the right tree structure, you could use rclone or rsync to keep everything synchronized.

    Once you have everything in the right tree structure, you could use rclone or rsync to keep everything synchronized.

    14 votes
    1. [3]
      valar
      Link Parent
      That's my goal. How do I get to the right structure is the question. It's too many files and directories to do manually

      That's my goal. How do I get to the right structure is the question. It's too many files and directories to do manually

      2 votes
      1. [2]
        Bwerf
        Link Parent
        What kind of info do you have in the files? Python can read exif and timestamps at least. I'd check a couple of files to know what i had to work with. Do you have the "right" structure in the...

        What kind of info do you have in the files? Python can read exif and timestamps at least. I'd check a couple of files to know what i had to work with.

        Do you have the "right" structure in the cloud version?

        File name is probably enough as long as you haven't renamed some files. If you have you can calculate hashes for the files and compare those.

        I'd probably do a diff of the tree structures to get an overview of what i had to work with as well.

        1 vote
        1. valar
          Link Parent
          Most but not all files are photos. I think filename is the way to go to be honest

          Most but not all files are photos.

          I think filename is the way to go to be honest

  2. whbboyd
    Link
    Are corresponding files bit-identical? If so, you can likely efficiently get a directory of files/paths/checksums from OneDrive. You could then compute the corresponding checksums for each file on...

    Are corresponding files bit-identical?

    If so, you can likely efficiently get a directory of files/paths/checksums from OneDrive. You could then compute the corresponding checksums for each file on your NAS (a Unix one-liner: find /nas-root -type f -print0 | xargs -0 sha256sum, for example; run this locally on the NAS if you possibly can, checksumming over wifi, or even a decently fast wired network connection, will take forever) and rename on the basis of matching checksums. You'll also get an easy list of files present on only one or the other, and duplicates if that's of interest to you.

    5 votes
  3. cstby
    Link
    How about this? Check what date you first uploaded everything to OneDrive. Find all files on your NAS that were created or modified on or after that date. Copy those files to OneDrive. Sync all...

    How about this?

    1. Check what date you first uploaded everything to OneDrive.
    2. Find all files on your NAS that were created or modified on or after that date.
    3. Copy those files to OneDrive.
    4. Sync all files from OneDrive to your NAS.
    2 votes
  4. mxuribe
    Link
    Hi @valar, I'm going to assume that you are NOT in any hurray to complete this...so with my assumption out of the way, can i suggest the following? Before beginning, make sure you make backups of...

    Hi @valar, I'm going to assume that you are NOT in any hurray to complete this...so with my assumption out of the way, can i suggest the following?

    • Before beginning, make sure you make backups of everything....and by backups, i mean follow the sage old advise of 3 backups, ideally on 2 different media (or maybe 2 different machines, etc.), and 1 backup being offsite.
    • Now, you should choose which space - either NAS or onedrive - should be considered the main/master source for records. I would choose NAS, because being more local, research or viewing files as well as applying any changes (including running any scripts) will be much faster than reviewing and changing things against the cloud files on your onedrive.
    • Then, start thinking only high level how you wish to organize the new hierarchy of ALL of your files...the intent is that you will establish a rough estimation of a hierarchy, and over much time, you will arrive at a direction for how to organize things, and then of course keep things in sync between the NAS and onedrive.
    • For example, take all image and video files and whatever directories they currently exist in, and move them under "media", and have a separate folder for files that are NOT media files. This gives you 2 massive folders.
    • From there, start slowly creating sub-hierarchies that make sense for you.
    • Then, stop for the day...Sure you can run another backup (to ensure that nothing is lost)...but stop fiddling with hierarchies for the day. Move on to something else, and plan to come back to this work another day, etc.
    • My approach is slow, but if you caught my nuance, basically this gives you a bit of a zen habit, sort of like slowly tending to a garden; but you're curating your local digital garden.
    • At some point, you will make a decision on what exact hierarchy you wish to have to organize all your files, and then treat this evolved hierarchy as the main/master source of truth...and over time - at least i hope like it happened for me - this will give you calm and give your brain something soothing to do on a daily or weekly or whatever cadence basis. ...but also an evolved organization for how your files are stored.
    • When you feel that you have made some progress - this is entirely based on how you feel you have made progress - then, look to the other place (for example if you started on NAS side, then now look to the onedrive files, etc.)...and then start aligning that second place to be the same hierarchy as the original/first place.

    Beyond the zen aspect, you'll stumble upon duplicates and can decide how you wish to handle that. Are they true duplicates or different versions of the same file, etc. Without any rush, things become easier to deal with. After you have a more establish direction with your hierarchy, then - if you wish - you can start thinking if its more efficient to start scripting things...but, if there's no rush you can also just take your time....Again, use this as an exercise both for tidying things up but also as zen thing. Finally, assuming you keep backups along the way, then nothing is theoretically lost. I hope this helps!

    EDIT: Made minor spelling corrections.

    2 votes