Having been so meticulous about taking back ups, I’ve perhaps not as been as careful about where I stored them, so I now have a loads of duplicate files in various places. I;ve tried various tools fdupes, czawka etc. , but none seems to do what I want… I need a tool that I can tell which folder (and subfolders) is the source of truth, and to look for anything else, anywhere else that’s a duplicate, and give me an option to move or delete. Seems simple enough, but I have found nothing that allows me to do that… Does anyone know of anything ?

  • speculatrix@alien.topB
    link
    fedilink
    English
    arrow-up
    2
    ·
    1 year ago

    Write a simple script which iterates over the files and generates a hash list, with the hash in the first column.

    find . -type f -exec md5sum {} ; >> /tmp/foo

    Repeat for the backup files.

    Then make a third file by concatenating the two, sort that file, and run “uniq -d”. The output will tell you the duplicated files.

    You can take the output of uniq and de-duplicate.

    • parkercp@alien.topOPB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Thanks @speculatrix - I wish I had your confidence in scripting - hence I’m hoping to find something that does all that clever stuff for me… The key thing for me is to say something like multimedia/photos/ is the source of truth anything found elsewhere is a duplicate …

    • jerwong@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      I think you need a \ in front of the ;

      i.e.: find . -type f -exec md5sum {} \; >> /tmp/foo