Security News
Fluent Assertions Faces Backlash After Abandoning Open Source Licensing
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.
file-dedupe
Advanced tools
Fast duplicate file detection library
npm install --save file-dedupe
The algorithm is as follows:
findup
is quite fast - it is within 2x of the fastest duplicate finders written in C/C++. Based on the V8 profiler output, about 40% of the time is spent on I/O, 13% on crypto and 11% on file traversal, so any further gains in performance will need to come from I/O optimizations rather than code optimizations.
rdfind: 2.22user 1.96system 0:04.24elapsed 98%CPU (0avgtext+0avgdata 57984maxresident)k
duff: 2.66user 1.66system 0:04.34elapsed 99%CPU (0avgtext+0avgdata 80432maxresident)k
fslint: 9.49user 5.78system 0:11.01elapsed 138%CPU (0avgtext+0avgdata 29632maxresident)k
findup: 5.36user 3.29system 0:08.20elapsed 105%CPU (0avgtext+0avgdata 717056maxresident)k
BTW, you may notice that file-dedupe
defaults to sync I/O. This is because the async I/O seems to have significant overhead for typical FS tasks. You can test this out by passing the --async
flag on your system.
new Dedupe({ async: false})
: creates a new class, which holds all the cached metadata. Options:
async
: whether to use async or sync I/O for hashing files. Defaults to sync, which is usually faster.dedupe.find(file, [stat], onDone)
: callback (err, result)
where result is either false
or a full path to a file that was previously deduplicated. You can optionally pass in a fs.Stat
object to avoid having to do another fs.stat
call in dedupe.For a usage example, see bin/findup
.
file-dedupe
ships with findup
, a basic CLI tool for finding duplicates. To get it, install the module globally: npm install -g file-dedupe
.
Usage: findup --include <path>
Options:
--include <path> Include path
--stdin Read list from stdin
--list Return full paths (plain output, suitable for xargs)
--json Return JSON output
--omit-first Omit the first file in each set of matches
--async Use async I/O instead of sync I/O (async is often slower)
--delete Delete duplicate files (all files will be deleted unless
--omit-first is set)
--help Display help
-v, --version Display version
For example, to find all duplicates in current directory and below:
findup --include . > report.txt
Note that progress is reported on stderr, and output is produced on stdout, so you can just pipe the output to ignore the status information.
Advanced selection:
If you want to select files by size or by user, you can use the Unix find
command to filter out files. For example:
find . -name "*.csv" -print | findup --stdin > report.txt
To only look at files with size > 100k:
find . -size +100k -print | findup --stdin > report.txt
FAQs
Fast duplicate file detection library
The npm package file-dedupe receives a total of 7 weekly downloads. As such, file-dedupe popularity was classified as not popular.
We found that file-dedupe demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.
Research
Security News
Socket researchers uncover the risks of a malicious Python package targeting Discord developers.
Security News
The UK is proposing a bold ban on ransomware payments by public entities to disrupt cybercrime, protect critical services, and lead global cybersecurity efforts.