12
votes
File structure difference between NAS and cloud storage
I have a NAS with a ton of photos and documents that have remained untouched for around 6 years. I uploaded all that stuff to OneDrive. Tidied it up and kept using OneDrive mostly. But I also sent stuff to the NAS. They have diverged.
I'm thinking about ways of restructuring/sorting my NAS to match my OneDrive so that I can then sync the two. I thought about making a python script that would just match on file names and move them to the correct location.
Figured before I did I'd ask if anyone else had any other suggestions
Once you have everything in the right tree structure, you could use rclone or rsync to keep everything synchronized.
That's my goal. How do I get to the right structure is the question. It's too many files and directories to do manually
What kind of info do you have in the files? Python can read exif and timestamps at least. I'd check a couple of files to know what i had to work with.
Do you have the "right" structure in the cloud version?
File name is probably enough as long as you haven't renamed some files. If you have you can calculate hashes for the files and compare those.
I'd probably do a diff of the tree structures to get an overview of what i had to work with as well.
Most but not all files are photos.
I think filename is the way to go to be honest
Are corresponding files bit-identical?
If so, you can likely efficiently get a directory of files/paths/checksums from OneDrive. You could then compute the corresponding checksums for each file on your NAS (a Unix one-liner:
find /nas-root -type f -print0 | xargs -0 sha256sum
, for example; run this locally on the NAS if you possibly can, checksumming over wifi, or even a decently fast wired network connection, will take forever) and rename on the basis of matching checksums. You'll also get an easy list of files present on only one or the other, and duplicates if that's of interest to you.How about this?
Hi @valar, I'm going to assume that you are NOT in any hurray to complete this...so with my assumption out of the way, can i suggest the following?
Beyond the zen aspect, you'll stumble upon duplicates and can decide how you wish to handle that. Are they true duplicates or different versions of the same file, etc. Without any rush, things become easier to deal with. After you have a more establish direction with your hierarchy, then - if you wish - you can start thinking if its more efficient to start scripting things...but, if there's no rush you can also just take your time....Again, use this as an exercise both for tidying things up but also as zen thing. Finally, assuming you keep backups along the way, then nothing is theoretically lost. I hope this helps!
EDIT: Made minor spelling corrections.