De-Dup Duplication Software Recommendation Please

Over the years of accumulated file hoarding, saving multiple versions when doing backups and poor discipline and system on folder management, I have came to a point where I have 300K of photos (which I think are multiple copies), and over 12K of MP3 (which again most likely to be multiple copies) all hidden deep in folders after folders.

I recently (1-2 years back) uploaded all mp3s to Google music (free service) and photos to Amazon Photos (paid service), yesterday I attempted to download the entire contents from both site using Telstra's free mobile broadband day but it barely made a dent (this was when I realised I got that many duplicates and files).

I tried to solve the photo duplicates issue by putting all the files in one massive folder, but Windows didnt really help when the three options (1) overwrite, (2) do not overwrite, and (3) create diff filenames covers (at times) from few to hundreds of files. Also, because of the slight file creation timing difference, at times the file is indeed the same but just differ in terms of few KBs to file creation time/ date. In hindsight, I might have created a bigger monster as I was worried that overwriting might delete the files and reviewing thousands and thousands of files just isnt possible; took the easy way out of creating diff filenames =(

Is there a photo de-dup software that does a good job; in terms of review and delete process and also recognising the files in more ways than just filenames, size but go further i.e. analyse the actual photo (something like Picasa face recognition), make better guesses at same photos even though the file creation is newer (because of duplication efforts) etc?

Also, is there a MP3 de-dup software that works like Shazam in terms of recognising the tunes and auto-download the song name and singer, and also work to ID duplicates and remove them; without having to review each file one at a time?


TL;DR
Looking for good software that can remove duplicates =)

Comments

  • -1

    Just buy more hard drives, its cheaper and easier.

    • thats exactly what i am doing … bought this (https://www.ozbargain.com.au/node/238931) and realised I was actually nearing to the deep-end of 4tb ..

      Going to setup a NAS with raid 5 soon, I assume if 3tb x 5 will yield 15; divide by mirror which leaves me 7.5tb approximate (based this on an old cs406e synology NAS). If the principle or tech of mirror raid hasnt change, I think my investment of the 5 WD reds will hit limit before i know it.

      • RAID 5 is not mirroring but distributed parity. You lose the capacity of one member so available storage = 12TB.

    • i'm impressed by just reading the similarity landing page! acoustic fingerprint sounds like space age for noobs like me =)

      have u used any of them before?

      • I used TuneUp ages and ages and ages ago, fairly sure it was (maybe still is) an iTunes plugin

        • I like the similarity for the asian language support, thanks =)

  • Windows Server 2012 r2 had built in dedup it's not bad but your not going to save a lot with many dedup solution as images are normally compressed unless your storing in raw format.

    Also as for raid if your looking at software only check out storage spaces also part of Windows 2012 R2

  • Been avoiding this same problem for too long. It's too easy to avoid the problem, and just keep making copies. "At least make copies" I kept saying to myself.

    I've got 3 copies of everything, plus a couple of versions Google Drive stuffed-up syncing… but it's all in varying stages of each other. So many times, the file name/size/appearance are similar but different… and I don't know what's what. Some have been emailed, tweaked, corrected etc. and there is simply WAYYY too much to go through, considering how many get taken each day (got a young kid). The problem isn't getting worse, but the past is a MESS.

    Looking forward to all ideas, but I think this really isn't the site for a specialist question, like this.

  • +1

    There's at least two Windows apps that do what you want as far as analyse images then pic the best version automatically and delete everything else. Unfortunately I cannot recall the name since I stopped using internet pornography.

    There's probably something like that for audio I'm assuming.

  • For the photos you might want to try VisiPics. It find duplicates of photos based on the picture and lists them all out and suggests which one to keep based on which is the highest quality.

    It has variable settings for how close the photos need to be.

    I ran it over my collection and for example on the "Basic" setting it said two photos where a photo of my partner had their eyes closed in one and open in the other, was the same photo.

    As for music, I have mine fairly well catalogued and have only had a couple of dupes slip in when they weren't catalogued consistently.

    • Will try Visipics, thanks.

      I tried the similarity for mp3s; out of 12K mp3s, it found 3.5k dups. Have yet to go through but reckon it will be close.

  • I used to use Odin Professional, which was brilliant up until I hit a wall at 3.5 million images. The developer seems to have vanished now, with no updates for years.

    I now use a product called DoublePics, which is almost as good in finding dupes of images, although not as nice to use.

    The free version offers everything the basic paid version does, except in-program dupe deletion. This can be a bit of a hassle if you're just starting and have hundreds of dupes, as you have to find each one (it will open the correct folder in explorer for you) and delete it.

    • 3.5m!!! Photographer?

      • LOL … no.

        That was a while ago now. I'm probably approaching 4.5 million now.

        I collect all sorts of stuff from many sources. I like to have something available if someone should ask. Then there's the obvious …

        • Any good place to get free stock photos =p

        • @LurvinOZB:
          Most search sites have an image search function. Google allows you to search with various copyright restrictions.

          Usenet is also a great source for images.

          It takes a long time to build such a large collection though. I started back in 1994, and still have my original image CD which cost me $200 to burn in 1998. Back then, 1GB drives were a $1000, so it was a bit of a bargain.

Login or Join to leave a comment