jscpd
Copy/paste detector for programming source code, supports 150+ formats.
Copy/paste is a common technical debt on a lot of projects. The jscpd gives the ability to find duplicated blocks implemented on more than 150 programming languages and digital formats of documents.
The jscpd tool implements Rabin-Karp algorithm for searching duplications.
Table of content
Features
- Detect duplications in programming source code, use semantic of programing languages, can skip comments, empty lines etc.
- Detect duplications in embedded blocks of code, like
<script>
or <style>
sections in html - Blame authors of duplications
- Generate XML report in pmd-cpd format, JSON report, HTML report
- Integrate with CI systems, use thresholds for level of duplications
Getting started
Installation
$ npm install -g jscpd
Usage
$ npx jscpd /path/to/source
or
$ jscpd /path/to/code
or
$ jscpd --pattern "src/**/*.js"
Options
Pattern
Glob pattern for find files to detect
- Cli options:
--pattern
, -p
- Type: string
- Default: "**/*"
Example:
$ jscpd --pattern "**/*.js"
Min Tokens
Minimal block size of code in tokens. The block of code less than min-tokens
will be skipped.
- Cli options:
--min-tokens
, -k
- Type: number
- Default: 50
This option is called minTokens
in the config file.
Min Lines
Minimal block size of code in lines. The block of code less than min-lines
will be skipped.
- Cli options:
--min-lines
, -l
- Type: number
- Default: 5
Max Lines
Maximum file size in lines. The file bigger than max-lines
will be skipped.
- Cli options:
--max-lines
, -x
- Type: number
- Default: 1000
Max Size
Maximum file size in bytes. The file bigger than max-size
will be skipped.
- Cli options:
--max-size
, -z
- Type: string
- Default: 100kb
Threshold
The threshold for duplication level, check if current level of duplications bigger than threshold jscpd exit with error.
- Cli options:
--threshold
, -t
- Type: number
- Default: null
Config
The path to configuration file. The config should be in json
format. Supported options in config file can be the same with cli options.
- Cli options:
--config
, -c
- Type: path
- Default: null
Ignore
The option with glob patterns to ignore from analyze. For multiple globs you can use comma as separator.
Example:
$ jscpd --ignore "**/*.min.js,**/*.map" /path/to/files
- Cli options:
--ignore
, -i
- Type: string
- Default: null
Reporters
The list of reporters. Reporters use for output information of clones and duplication process.
Available reporters:
- console - report about clones to console;
- consoleFull - report about clones to console with blocks of code;
- json - output
jscpd-report.json
file with clones report in json format; - xml - output
jscpd-report.xml
file with clones report in xml format; - csv - output
jscpd-report.csv
file with clones report in csv format; - markdown - output
jscpd-report.md
file with clones report in markdown format; - html - generate html report to
html/
folder; - sarif - generate a report in SARIF format (https://github.com/oasis-tcs/sarif-spec), save it to
jscpd-sarif.json
file; - verbose - output a lot of debug information to console;
Note: A reporter can be developed manually, see @jscpd/finder package.
- Cli options:
--reporters
, -r
- Type: string
- Default: console
Output
The path to directory for reports. JSON and XML reports will be saved there.
- Cli options:
--output
, -o
- Type: path
- Default: ./report/
Mode
The mode of detection quality.
strict
- use all types of symbols as token, skip only blocks marked as ignored.mild
- skip blocks marked as ignored and new lines and empty symbols.weak
- skip blocks marked as ignored and new lines and empty symbols and comments.
Note: A mode can be developed manually, see API section.
- Cli options:
--mode
, -m
- Type: string
- Default: mild
Format
The list of formats to detect for duplications. Available over 150 formats.
Example:
$ jscpd --format "php,javascript,markup,css" /path/to/files
- Cli options:
--format
, -f
- Type: string
- Default: {all formats}
Blame
Get information about authors and dates of duplicated blocks from git.
- Cli options:
--blame
, -b
- Type: boolean
- Default: false
Silent
Don't write a lot of information to a console.
Example:
$ jscpd /path/to/source --silent
Duplications detection: Found 60 exact clones with 3414(46.81%) duplicated lines in 100 (31 formats) files.
Execution Time: 1381.759ms
- Cli options:
--silent
, -s
- Type: boolean
- Default: false
Absolute
Use the absolute path in reports.
- Cli options:
--absolute
, -a
- Type: boolean
- Default: false
Ignore Case
Ignore case of symbols in code (experimental).
- Cli options:
--ignoreCase
- Type: boolean
- Default: false
No Symlinks
Do not follow symlinks.
- Cli options:
--noSymlinks
, -n
- Type: boolean
- Default: false
Skip Local
Use for detect duplications in different folders only. For correct usage of --skipLocal
option you should provide list of path's with more than one item.
Example:
jscpd --skipLocal /path/to/folder1/ /path/to/folder2/
will detect clones in separate folders only, clones from same folder will be skipped.
- Cli options:
--skipLocal
- Type: boolean
- Default: false
Formats Extensions
Define the list of formats with file extensions. Available over 150 formats.
In following example jscpd will analyze files *.es
and *.es6
as javascript and *.dt
files as dart:
$ jscpd --formats-exts javascript:es,es6;dart:dt /path/to/code
Note: formats defined in the option redefine default configuration, you should define all need formats manually or create two configuration for run jscpd
- Cli options:
--formats-exts
- Type: string
- Default: null
Store
Stores used for collect information about code, by default all information collect in memory.
Available stores:
- leveldb - leveldb store all data to files. The store recommended as store for big repositories. Should install @jscpd/leveldb-store before;
Note: A store can be developed manually, see @jscpd/finder package and @jscpd/leveldb-store as example.
- Cli options:
--store
- Type: string
- Default: null
Ignore Pattern
Ignore code blocks matching the regexp patterns.
- Cli options:
--ignore-pattern
- Type: string
- Default: null
Example:
$ jscpd /path/to/source --ignore-pattern "import.*from\s*'.*'"
Excludes import statements from the calculation.
Config File
Put .jscpd.json
file in the root of the projects:
{
"threshold": 0,
"reporters": ["html", "console", "badge"],
"ignore": ["**/__snapshots__/**"],
"absolute": true
}
Also you can use section in package.json
:
{
...
"jscpd": {
"threshold": 0.1,
"reporters": ["html", "console", "badge"],
"ignore": ["**/__snapshots__/**"],
"absolute": true,
"gitignore": true
}
...
}
Exit code
By default, the tool exits with code 0 even code duplications were
detected. This behaviour can be changed by specifying a custom exit
code for error states.
Example:
jscpd --exitCode 1 .
- Cli options:
--exitCode
- Type: number
- Default: 0
Ignored Blocks
Mark blocks in code as ignored:
import lodash from 'lodash';
import React from 'react';
import {User} from './models';
import {UserService} from './services';
<meta data-react-helmet="true" name="theme-color" content="#cb3837"/>
<link data-react-helmet="true" rel="stylesheet" href="https://static.npmjs.com/103af5b8a2b3c971cba419755f3a67bc.css"/>
<link data-react-helmet="true" rel="stylesheet" href="https://static.npmjs.com/cms/flatpages.css"/>
<link data-react-helmet="true" rel="apple-touch-icon" sizes="120x120" href="https://static.npmjs.com/58a19602036db1daee0d7863c94673a4.png"/>
<link data-react-helmet="true" rel="apple-touch-icon" sizes="144x144" href="https://static.npmjs.com/7a7ffabbd910fc60161bc04f2cee4160.png"/>
<link data-react-helmet="true" rel="apple-touch-icon" sizes="152x152" href="https://static.npmjs.com/34110fd7686e2c90a487ca98e7336e99.png"/>
<link data-react-helmet="true" rel="apple-touch-icon" sizes="180x180" href="https://static.npmjs.com/3dc95981de4241b35cd55fe126ab6b2c.png"/>
<link data-react-helmet="true" rel="icon" type="image/png" href="https://static.npmjs.com/b0f1a8318363185cc2ea6a40ac23eeb2.png" sizes="32x32"/>
Reporters
HTML
Demo report
Badge
More info jscpd-badge-reporter
PMD CPD XML
<?xml version="1.0" encoding="utf-8"?>
<pmd-cpd>
<duplication lines="10">
<file path="/path/to/file" line="1">
<codefragment><![CDATA[ ...first code fragment... ]]></codefragment>
</file>
<file path="/path/to/file" line="5">
<codefragment><![CDATA[ ...second code fragment...}]]></codefragment>
</file>
<codefragment><![CDATA[ ...duplicated fragment... ]]></codefragment>
</duplication>
</pmd-cpd>
JSON reporters
{
"duplicates": [{
"format": "javascript",
"lines": 27,
"fragment": "...code fragment... ",
"tokens": 0,
"firstFile": {
"name": "tests/fixtures/javascript/file2.js",
"start": 1,
"end": 27,
"startLoc": {
"line": 1,
"column": 1
},
"endLoc": {
"line": 27,
"column": 2
}
},
"secondFile": {
"name": "tests/fixtures/javascript/file1.js",
"start": 1,
"end": 24,
"startLoc": {
"line": 1,
"column": 1
},
"endLoc": {
"line": 24,
"column": 2
}
}
}],
"statistic": {
"detectionDate": "2018-11-09T15:32:02.397Z",
"formats": {
"javascript": {
"sources": {
"/path/to/file": {
"lines": 24,
"sources": 1,
"clones": 1,
"duplicatedLines": 26,
"percentage": 45.33,
"newDuplicatedLines": 0,
"newClones": 0
}
},
"total": {
"lines": 297,
"sources": 1,
"clones": 1,
"duplicatedLines": 26,
"percentage": 45.33,
"newDuplicatedLines": 0,
"newClones": 0
}
}
},
"total": {
"lines": 297,
"sources": 6,
"clones": 5,
"duplicatedLines": 26,
"percentage": 45.33,
"newDuplicatedLines": 0,
"newClones": 0
}
}
}
API
For integration copy/paste detection to your application you can use programming API:
jscpd
Promise API
import {IClone} from '@jscpd/core';
import {jscpd} from 'jscpd';
const clones: Promise<IClone[]> = jscpd(process.argv);
jscpd
async/await API
import {IClone} from '@jscpd/core';
import {jscpd} from 'jscpd';
(async () => {
const clones: IClone[] = await jscpd(['', '', __dirname + '/../fixtures', '-m', 'weak', '--silent']);
console.log(clones);
})();
detectClones
API
import {detectClones} from "jscpd";
(async () => {
const clones = await detectClones({
path: [
__dirname + '/../fixtures'
],
silent: true
});
console.log(clones);
})()
detectClones
with persist store
import {detectClones} from "jscpd";
import {IMapFrame, MemoryStore} from "@jscpd/core";
(async () => {
const store = new MemoryStore<IMapFrame>();
await detectClones({
path: [
__dirname + '/../fixtures'
],
}, store);
await detectClones({
path: [
__dirname + '/../fixtures'
],
silent: true
}, store);
})()
In case of deep customisation of detection process you can build your own tool:
If you are going to detect clones in file system you can use @jscpd/finder for make a powerful detector.
In case of detect clones in browser or not node.js environment you can build your own solution base on @jscpd/code
Changelog
Changelog
Who uses jscpd
- Code-Inspector is a code analysis and technical debt management service.
- Mega-Linter is a 100% open-source linters aggregator for CI (GitHub Action & other CI tools) or to run locally
- vscode-jscpd VSCode Copy/Paste detector plugin.
Contributors
This project exists thanks to all the people who contribute.
Backers
Thank you to all our backers! 🙏 [Become a backer]
Support this project by becoming a sponsor. Your logo will show up here with a link to your website. [Become a sponsor]
License
MIT © Andrey Kucherenko