Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

dedupe-files

Package Overview
Dependencies
Maintainers
1
Versions
13
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

dedupe-files

Finds all duplicate files across the set of paths and then will **print** them out, **move** them to a directory, or **delete** them. Duplicates are identified by their actual content not their name or other attributes.

  • 1.4.1
  • latest
  • Source
  • npm
  • Socket score

Version published
Weekly downloads
1
Maintainers
1
Weekly downloads
 
Created
Source

dedupe-files

Finds all duplicate files across the set of paths and then will print them out, move them to a directory, or delete them. Duplicates are identified by their actual content not their name or other attributes.

NPM version Node Version Build Status License

Install / Requirements

Requires Node.js >=16 Just run it with npx (as shown below) or install it globally with npm install -g dedupe-files

Usage

Usage: dedupe-files <command> [options]

Finds all duplicate files across the set of paths and then will **print** them out, **move** them to a directory, or **delete** them. Duplicates are identified by their actual content not their name or other attributes.

Options:
  -h, --help                        display help for command

Commands:
  print [options] <input_paths...>  print out duplicates
  move [options] <input_paths...>   move duplicates to a directory
  delete <input_paths...>           delete duplicate files
  help [command]                    display help for command

Examples:

The following prints out a line to duplicates.txt for each duplicate file found in /Volumes/photos and /Volumes/backups/photos:

  $ dedupe-files print --out "duplicates.txt" "/Volumes/photos" "/Volumes/backups/photos"

The following moves each duplicate file found in /Volumes/photos and /Volumes/backups/photos to ~/Downloads/duplicates.
The files in ~/Downloads/one are considered more "original" than those in ~/Downloads/two since it appears earlier on the command line:

  $ dedupe-files move --out "~/Downloads/duplicates" "~/Downloads/one" "~/Downloads/two"

print command

Usage: dedupe-files print [options] <input_paths...>

Prints duplicate files to terminal or to a file.

Options:
  -n, --names       include files with duplicate names, but different content
  -o, --out <file>  A file path to output the duplicate file paths to. If not specified, file paths are written to stdout.
  -h, --help        display help for command

move command

Usage: dedupe-files move [options] <input_paths...>

Moves duplicate files to a designated directory.

Options:
  -o, --out <path>  Directory to output the duplicate files to.
  -h, --help        display help for command

Remarks:

Files in *input_paths* that appear earlier on the command line are considered more "original".
That is, the duplicates that are moved are the ones that are rooted in the last-most *input_paths* argument.

delete command

Usage: dedupe-files delete [options] <input_paths...>

Deletes duplicate files.

Options:
  -h, --help  display help for command

Remarks:

Files in *input_paths* that appear earlier on the command line are considered more "original".
That is, the duplicates that are deleted are the ones that are rooted in the last-most *input_paths* argument.
If duplicates are in the same directory tree, then which one is deleted is not deterministic (but it will leave one behind).

Examples

print

$ npx dedupe-files print \
  --out "duplicates.txt" \
  '/Volumes/scott-photos/photos/2014' \
  '/Volumes/home/!Backups/PHOTOS/Photos - backup misc/2014'

Searching /Volumes/scott-photos/photos/2014 (priority 0)
Searching /Volumes/home/!Backups/PHOTOS/Photos - backup misc/2014 (priority 1)
Searching /Volumes/scott-photos/photos/2014/08 (priority 0)
Searching /Volumes/scott-photos/photos/2014/02 (priority 0)
Searching /Volumes/scott-photos/photos/2014/12 (priority 0)
Searching /Volumes/scott-photos/photos/2014/11 (priority 0)
Searching /Volumes/scott-photos/photos/2014/10 (priority 0)
Searching /Volumes/scott-photos/photos/2014/09 (priority 0)
Searching /Volumes/scott-photos/photos/2014/07 (priority 0)
Searching /Volumes/scott-photos/photos/2014/06 (priority 0)
Searching /Volumes/scott-photos/photos/2014/05 (priority 0)
Searching /Volumes/scott-photos/photos/2014/04 (priority 0)
Searching /Volumes/scott-photos/photos/2014/03 (priority 0)
Searching /Volumes/scott-photos/photos/2014/01 (priority 0)
Found 2,829 files...
Writing to output file /Users/scott/src/activescott/files-and-folders/packages/dedupe-files/tests/integration/print-photos-small.sh.out.
Comparing files with identical sizes...
Hashing 1,749 files in batches of 64: 4%...7%...11%...15%...18%...22%...26%...29%...33%...37%...40%...44%...48%...51%...55%...59%...62%...66%...70%...73%...77%...81%...84%...88%...91%...95%...
Hashing files complete.
Duplicate files consume 5.4 GB.
872 duplicate files written to output file /Users/scott/src/activescott/files-and-folders/packages/dedupe-files/tests/integration/print-photos-small.sh.out.
print took 16.659 seconds.

# duplicates.txt:

content identical: /Volumes/scott-photos/photos/2014/10/IMG_0533.JPG and /Volumes/home/!Backups/PHOTOS/Photos - backup misc/2014/IMG_0070.jpg
content identical: /Volumes/scott-photos/photos/2014/03/IMG_0881.JPG and /Volumes/home/!Backups/PHOTOS/Photos - backup misc/2014/IMG_0881.JPG
...
content identical: /Volumes/scott-photos/photos/2014/08/IMG_1285.MOV and /Volumes/home/!Backups/PHOTOS/Photos - backup misc/2014/IMG_1285.MOV

move

$ npx dedupe-files@beta move \
  --out "~/Downloads/duplicates" \
  "~/Downloads/one" \
  "~/Downloads/two"

Searching /Users/scott/Downloads/dedupe-files-temp/one (priority 0)
Searching /Users/scott/Downloads/dedupe-files-temp/two (priority 1)
Searching /Users/scott/Downloads/dedupe-files-temp/two/toodeep (priority 1)
Found 8 files...
Hashing 6 files in batches of 64...
Hashing files complete.
Moving /Users/scott/Downloads/dedupe-files-temp/two/not-the-eye-test-pic.jpg to /Users/scott/Downloads/dedupe-files-temp/duplicates/not-the-eye-test-pic.jpg
Moving /Users/scott/Downloads/dedupe-files-temp/two/tv-test-pattern.png to /Users/scott/Downloads/dedupe-files-temp/duplicates/tv-test-pattern.png

Large Example with 8,978 duplicates across 61,347 files

npx dedupe-files print \
  --out "dupe-photos.out" \
  /Volumes/scott-photos/photos \
  /Volumes/home/\!Backups/PHOTOS/

Searching /Volumes/scott-photos/photos (priority 0)
...
Searching /Volumes/home/!Backups/PHOTOS/Scotts Photos Library2020-12-21.photoslibrary/resources/cloudsharing/resources/derivatives/masters/5 (priority 1)
Found 61,347 files...
Writing to output file /Users/scott/src/activescott/files-and-folders/packages/dedupe-files/tests/integration/print-photos-all.sh.out.
Comparing files with identical sizes...
Hashing 29,244 files in batches of 64: 0%...0%...1%...1%...1%...1%...2%...2%...2%...2%...2%...3%...3%...3%...3%...4%...4%...4%...4%...4%...5%...5%...5%...5%...5%...6%...6%...6%...6%...7%...7%...7%...7%...7%...8%...8%...8%...8%...9%...9%...9%...9%...9%...10%...10%...10%...10%...11%...11%...11%...11%...11%...12%...12%...12%...12%...12%...13%...13%...13%...13%...14%...14%...14%...14%...14%...15%...15%...15%...15%...16%...16%...16%...16%...16%...17%...17%...17%...17%...18%...18%...18%...18%...18%...19%...19%...19%...19%...19%...20%...20%...20%...20%...21%...21%...21%...21%...21%...22%...22%...22%...22%...23%...23%...23%...23%...23%...24%...24%...24%...24%...25%...25%...25%...25%...25%...26%...26%...26%...26%...26%...27%...27%...27%...27%...28%...28%...28%...28%...28%...29%...29%...29%...29%...30%...30%...30%...30%...30%...31%...31%...31%...31%...32%...32%...32%...32%...32%...33%...33%...33%...33%...33%...34%...34%...34%...34%...35%...35%...35%...35%...35%...36%...36%...36%...36%...37%...37%...37%...37%...37%...38%...38%...38%...38%...39%...39%...39%...39%...39%...40%...40%...40%...40%...40%...41%...41%...41%...41%...42%...42%...42%...42%...42%...43%...43%...43%...43%...44%...44%...44%...44%...44%...45%...45%...45%...45%...46%...46%...46%...46%...46%...47%...47%...47%...47%...47%...48%...48%...48%...48%...49%...49%...49%...49%...49%...50%...50%...50%...50%...51%...51%...51%...51%...51%...52%...52%...52%...52%...53%...53%...53%...53%...53%...54%...54%...54%...54%...54%...55%...55%...55%...55%...56%...56%...56%...56%...56%...57%...57%...57%...57%...58%...58%...58%...58%...58%...59%...59%...59%...59%...60%...60%...60%...60%...60%...61%...61%...61%...61%...61%...62%...62%...62%...62%...63%...63%...63%...63%...63%...64%...64%...64%...64%...65%...65%...65%...65%...65%...66%...66%...66%...66%...67%...67%...67%...67%...67%...68%...68%...68%...68%...68%...69%...69%...69%...69%...70%...70%...70%...70%...70%...71%...71%...71%...71%...72%...72%...72%...72%...72%...73%...73%...73%...73%...74%...74%...74%...74%...74%...75%...75%...75%...75%...76%...76%...76%...76%...76%...77%...77%...77%...77%...77%...78%...78%...78%...78%...79%...79%...79%...79%...79%...80%...80%...80%...80%...81%...81%...81%...81%...81%...82%...82%...82%...82%...83%...83%...83%...83%...83%...84%...84%...84%...84%...84%...85%...85%...85%...85%...86%...86%...86%...86%...86%...87%...87%...87%...87%...88%...88%...88%...88%...88%...89%...89%...89%...89%...90%...90%...90%...90%...90%...91%...91%...91%...91%...91%...92%...92%...92%...92%...93%...93%...93%...93%...93%...94%...94%...94%...94%...95%...95%...95%...95%...95%...96%...96%...96%...96%...97%...
Hashing files complete.
Duplicate files consume 117.7 GB.
8,978 duplicates written to output file /Users/scott/src/activescott/files-and-folders/packages/dedupe-files/tests/integration/print-photos-all.sh.out.
print took 2264 seconds.

Features

  • Actions to be taken on duplicates:
    • print
    • move
    • delete
  • Priorities: Files have priorities. Essentially the ones that are first on the specified set of input paths will be considered the "original".
  • Algorithm: Compares sizes first, and then hashes files. Could maybe be faster by doing partial hashes of same-size files

Roadmap

  • feat: searches multiple input paths

  • feat: compares files by content, not merely name or size

    • name same, content different
    • name same, content different, size same
    • name different, content same
  • chore: typescript. because types help (proven to myself yet again)

  • chore: fix git paths (git dir needs moved up to parent monorepo dir)

  • feat: optimize perf by tracking all files w/ size instead of hash, and only hash files where size is the same

  • feat: use deterministic algorithm to classify the "original" and the "duplicate" (i.e. first argument passed in is a higher priority for "original")

    • test: test-dir/one test-dir/two
    • test: test-dir/two test-dir/one
  • actions to be taken for each duplicate:

  • feat: print/dry-run action (no action) for found duplicate: prints to output file or stdout

  • feat: move file action for found duplicate to specified path

    • feat: ensure move command never loses a file by using "original" file's name with a postfixed a counter for uniqueness
  • feat: delete action for found duplicate

Alternatives

  • rdfind does this quite well but it isn't very flexible on what to do with the duplicates and it's slow (despite being in c++, which is not as easy to maintain or contribute to by others.

Below are more but I found these after I was well down the road of writing this one...

Keywords

FAQs

Package last updated on 11 Jun 2022

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc