simonw/csv-diff

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.github/workflows		.github/workflows
csv_diff		csv_diff
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Repository files navigation

csv-diff

PyPI Changelog Tests License

Tool for viewing the difference between two CSV, TSV or JSON files. See Generating a commit log for San Francisco’s official list of trees (and the sf-tree-history repo commit log) for background information on this project.

Installation

pip install csv-diff

Usage

Consider two CSV files:

one.csv

id,name,age
1,Cleo,4
2,Pancakes,2

two.csv

id,name,age
1,Cleo,5
3,Bailey,1

csv-diff can show a human-readable summary of differences between the files:

$ csv-diff one.csv two.csv --key=id
1 row changed, 1 row added, 1 row removed
1 row changed
 Row 1
 age: "4" => "5"
1 row added
 id: 3
 name: Bailey
 age: 1
1 row removed
 id: 2
 name: Pancakes
 age: 2

The --key=id option means that the id column should be treated as the unique key, to identify which records have changed.

The tool will automatically detect if your files are comma- or tab-separated. You can over-ride this automatic detection and force the tool to use a specific format using --format=tsv or --format=csv.

You can also feed it JSON files, provided they are a JSON array of objects where each object has the same keys. Use --format=json if your input files are JSON.

Use --show-unchanged to include full details of the unchanged values for rows with at least one change in the diff output:

% csv-diff one.csv two.csv --key=id --show-unchanged
1 row changed
 id: 1
 age: "4" => "5"
 Unchanged:
 name: "Cleo"

JSON output

You can use the --json option to get a machine-readable difference:

$ csv-diff one.csv two.csv --key=id --json
{
 "added": [
 {
 "id": "3",
 "name": "Bailey",
 "age": "1"
 }
 ],
 "removed": [
 {
 "id": "2",
 "name": "Pancakes",
 "age": "2"
 }
 ],
 "changed": [
 {
 "key": "1",
 "changes": {
 "age": [
 "4",
 "5"
 ]
 }
 }
 ],
 "columns_added": [],
 "columns_removed": []
}

Adding templated extras

You can specify additional keys to be displayed in the human-readable format using the --extra option:

--extra name "Python format string with {id} for variables"

For example, to output a link to https://news.ycombinator.com/latest?id={id} for each item with an ID, you could use this:

csv-diff one.csv two.csv --key=id \
 --extra latest "https://news.ycombinator.com/latest?id={id}"

These extras display something like this:

1 row changed
 id: 41459472
 points: "24" => "25"
 numComments: "5" => "6"
 extras:
 latest: https://news.ycombinator.com/latest?id=41459472

As a Python library

You can also import the Python library into your own code like so:

from csv_diff import load_csv, compare
diff = compare(
 load_csv(open("one.csv"), key="id"),
 load_csv(open("two.csv"), key="id")
)

diff will now contain the same data structure as the output in the --json example above.

If the columns in the CSV have changed, those added or removed columns will be ignored when calculating changes made to specific rows.

As a Docker container

Build the image

$ docker build -t csvdiff .

Run the container

$ docker run --rm -v $(pwd):/files csvdiff

Suppose current directory contains two csv files : one.csv two.csv

$ docker run --rm -v $(pwd):/files csvdiff one.csv two.csv

Alternatives

csvdiff is a "fast diff tool for comparing CSV files" - you may get better results from this than from csv-diff against larger files.

About

Python CLI tool and library for diffing CSV and JSON files

Releases 10

1.2 Latest

Sep 6, 2024

+ 9 releases

Sponsor this project

Learn more about GitHub Sponsors

Used by 224

@robert2687
@ncouture
@vgarcia007
@Vrooli
@Jguest7
@OGHighZenBerg
@Eiat5522

+ 216

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

License

Uh oh!

simonw/csv-diff

Folders and files

Latest commit

History

Repository files navigation

csv-diff

Installation

Usage

JSON output

Adding templated extras

As a Python library

As a Docker container

Build the image

Run the container

Alternatives

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 10

Sponsor this project

Uh oh!

Used by 224

Contributors 5

Uh oh!

Languages