Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

csv-diff

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

csv-diff

Python CLI tool and library for diffing CSV and JSON files

  • 1.2
  • PyPI
  • Socket score

Maintainers
1

csv-diff

PyPI Changelog Tests License

Tool for viewing the difference between two CSV, TSV or JSON files. See Generating a commit log for San Francisco’s official list of trees (and the sf-tree-history repo commit log) for background information on this project.

Installation

pip install csv-diff

Usage

Consider two CSV files:

one.csv

id,name,age
1,Cleo,4
2,Pancakes,2

two.csv

id,name,age
1,Cleo,5
3,Bailey,1

csv-diff can show a human-readable summary of differences between the files:

$ csv-diff one.csv two.csv --key=id
1 row changed, 1 row added, 1 row removed

1 row changed

  Row 1
    age: "4" => "5"

1 row added

  id: 3
  name: Bailey
  age: 1

1 row removed

  id: 2
  name: Pancakes
  age: 2

The --key=id option means that the id column should be treated as the unique key, to identify which records have changed.

The tool will automatically detect if your files are comma- or tab-separated. You can over-ride this automatic detection and force the tool to use a specific format using --format=tsv or --format=csv.

You can also feed it JSON files, provided they are a JSON array of objects where each object has the same keys. Use --format=json if your input files are JSON.

Use --show-unchanged to include full details of the unchanged values for rows with at least one change in the diff output:

% csv-diff one.csv two.csv --key=id --show-unchanged
1 row changed

  id: 1
    age: "4" => "5"

    Unchanged:
      name: "Cleo"

JSON output

You can use the --json option to get a machine-readable difference:

$ csv-diff one.csv two.csv --key=id --json
{
    "added": [
        {
            "id": "3",
            "name": "Bailey",
            "age": "1"
        }
    ],
    "removed": [
        {
            "id": "2",
            "name": "Pancakes",
            "age": "2"
        }
    ],
    "changed": [
        {
            "key": "1",
            "changes": {
                "age": [
                    "4",
                    "5"
                ]
            }
        }
    ],
    "columns_added": [],
    "columns_removed": []
}

Adding templated extras

You can specify additional keys to be displayed in the human-readable format using the --extra option:

--extra name "Python format string with {id} for variables"

For example, to output a link to https://news.ycombinator.com/latest?id={id} for each item with an ID, you could use this:

csv-diff one.csv two.csv --key=id \
  --extra latest "https://news.ycombinator.com/latest?id={id}"

These extras display something like this:

1 row changed

  id: 41459472
    points: "24" => "25"
    numComments: "5" => "6"
  extras:
    latest: https://news.ycombinator.com/latest?id=41459472

As a Python library

You can also import the Python library into your own code like so:

from csv_diff import load_csv, compare
diff = compare(
    load_csv(open("one.csv"), key="id"),
    load_csv(open("two.csv"), key="id")
)

diff will now contain the same data structure as the output in the --json example above.

If the columns in the CSV have changed, those added or removed columns will be ignored when calculating changes made to specific rows.

As a Docker container

Build the image

$ docker build -t csvdiff .

Run the container

$ docker run --rm -v $(pwd):/files csvdiff

Suppose current directory contains two csv files : one.csv two.csv

$ docker run --rm -v $(pwd):/files csvdiff one.csv two.csv

Alternatives

  • csvdiff is a "fast diff tool for comparing CSV files" - you may get better results from this than from csv-diff against larger files.

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc