31

I am looking to get only the diff of a file changed from a git repo. Right now, I am using gitpython to actually get the commit objects and the files of git changes, but I want to do a dependency analysis on only the parts of the file changed. Is there any way to get the git diff from git python? Or am I going to have to compare each of the files by reading line by line?

Ryan M
20.6k35 gold badges76 silver badges85 bronze badges
asked Nov 19, 2013 at 2:19

9 Answers 9

27

If you want to access the contents of the diff, try this:

repo = git.Repo(repo_root.as_posix())
commit_dev = repo.commit("dev")
commit_origin_dev = repo.commit("origin/dev")
diff_index = commit_origin_dev.diff(commit_dev)
for diff_item in diff_index.iter_change_type('M'):
 print("A blob:\n{}".format(diff_item.a_blob.data_stream.read().decode('utf-8')))
 print("B blob:\n{}".format(diff_item.b_blob.data_stream.read().decode('utf-8'))) 

This will print the contents of each file.

Aaron Brock
4,5762 gold badges29 silver badges46 bronze badges
answered Sep 1, 2017 at 21:14
Sign up to request clarification or add additional context in comments.

3 Comments

Excellent. This is the way to do it with the GitPython API, versus delegating directly to the Git CLI like Cairo's answer does.
What about using this code with different branches? I dont want to be fixed on one branch (dev in your case) diff getting.
you are awesoooooooome I was searching for this solution for 1 month!!! to make it more readable I shrink it to " diff = repo.commit(head1).diff(head2) "
20

You can use GitPython with the git command "diff", just need to use the "tree" object of each commit or the branch for that you want to see the diffs, for example:

repo = Repo('/git/repository')
t = repo.head.commit.tree
repo.git.diff(t)

This will print "all" the diffs for all files included in this commit, so if you want each one you must iterate over them.

With the actual branch it's:

repo.git.diff('HEAD~1')

Hope this help, regards.

answered Apr 27, 2014 at 6:15

1 Comment

How can I find out the diff between the tree and head~1. They have similarities, but the latter has more diff entries. Seems that the latter includes the diff to the last commit.
6

If you're looking to recreate something close to what a standard git diff would show, try:

# cloned_repo = git.Repo.clone_from(
# url=ssh_url,
# to_path=repo_dir,
# env={"GIT_SSH_COMMAND": "ssh -i " + SSH_KEY},
# ) 
for diff_item in cloned_repo.index.diff(None, create_patch=True):
 repo_diff += (
 f"--- a/{diff_item.a_blob.name}\n+++ b/{diff_item.b_blob.name}\n"
 f"{diff_item.diff.decode('utf-8')}\n\n"
 )
Adriaan
18.2k7 gold badges48 silver badges88 bronze badges
answered Oct 1, 2021 at 21:50

Comments

6

Git does not store the diffs, as you have noticed. Given two blobs (before and after a change), you can use Python's difflib module to compare the data.

answered Nov 19, 2013 at 4:36

4 Comments

The repo I am working on only has one master branch. If I am trying to get two blobs, how do i get a second one to compare the before change?
Sorry, just another question. So beyond trying to get both the a blob and b blob and understanding what those are, will those blobs actually give me the content of the file changed?
@user1816561 if you're referring two blobs as in the diff of your working tree vs. the most recent commit, you should use repo.head.commit.diff(None)
This is the python3 version of the link
1
repo.git.diff("main", "head~5")

result:

@@ -97,6 +97,25 @@ + </configuration> + + </plugin> + <plugin> - <groupId>org.codehaus.mojo</groupId> - <artifactId>findbugs-maven-plugin</artifactId> - <version>3.0.5</version> - <configuration> - <effort>Low</effort> - <threshold>Medium</threshold> 
Rimian
38.7k17 gold badges129 silver badges119 bronze badges
answered May 24, 2022 at 6:40

3 Comments

Welcome to Stack Overflow! Please read How to Ask and edit your question to contain an explanation as to why this code would actually solve the problem at hand. Always remember that you're not only solving the problem, but are also educating the OP and any future readers of this post.
result: @@ -97,6 +97,25 @@ + </configuration> + + </plugin> + <plugin> - <groupId>org.codehaus.mojo</groupId> - <artifactId>findbugs-maven-plugin</artifactId> - <version>3.0.5</version> - <configuration> - <effort>Low</effort> - <threshold>Medium</threshold>
If you want to update, edit, don't comment.
1

If you want to do git diff on a file between two commits this is the way to do it:

import git
 
repo = git.Repo()
path_to_a_file = "diff_this_file_across_commits.txt"
 
commits_touching_path = list(repo.iter_commits(paths=path_to_a_file))
 
print repo.git.diff(commits_touching_path[0], commits_touching_path[1], path_to_a_file)

This will show you the differences between two latest commits that were done to the file you specify.

Rimian
38.7k17 gold badges129 silver badges119 bronze badges
answered Sep 13, 2016 at 5:42

Comments

0

My task was compared to two files in different places. Here is a working option.

unidiff dependence that needs to be set as an additional module.

pip install unidiff

import difflib
from unidiff import PatchedFile, PatchSet
def create_diff(file1, file2):
 with open(file1, 'r',encoding='utf-8') as f1:
 lines1 = f1.readlines()
 with open(file2, 'r',encoding='utf-8') as f2:
 lines2 = f2.readlines()
 diff = difflib.unified_diff(lines1, lines2, fromfile=file1, tofile=file2)
 diff_string = ''.join(diff) 
 # patch = PatchedFile.from_string(diff_string)
 patch = PatchSet.from_string(diff_string)
 return str(patch)
if __name__ == "__main__":
 file1 = r"path/to/file1.txt"
 file2 = r"path/to/file2.txt"
 print(create_diff(file1, file2))
 

result

--- c:\REPO\path\to\file1.txt
+++ c:\REPO\path\to\file2.tx
@@ -6021,28 +6021,6 @@
 k3.setucs(k3.k_delete, cursnm)
 if exp_p.result_hh:
 k3.visible(exp_p.parent)
-
- #-------------------------------------------------------------------------------
- # cursnm = get_random_name(pref="u")
answered Jun 20, 2025 at 13:19

Comments

-2

Here is how you do it

import git
repo = git.Repo("path/of/repo/")
# the below gives us all commits
repo.commits()
# take the first and last commit
a_commit = repo.commits()[0]
b_commit = repo.commits()[1]
# now get the diff
repo.diff(a_commit,b_commit)
Adriaan
18.2k7 gold badges48 silver badges88 bronze badges
answered Mar 7, 2014 at 9:12

1 Comment

This code does not work. AttributeError: 'Repo' object has no attribute 'diff' and AttributeError: 'Repo' object has no attribute 'commits'
-2

PyDriller +1

pip install pydriller

But with the new API:

Breaking API: ```
from pydriller import Repository
for commit in Repository('https://github.com/ishepard/pydriller').traverse_commits():
 print(commit.hash)
 print(commit.msg)
 print(commit.author.name)
 for file in commit.modified_files:
 print(file.filename, ' has changed')
answered Jun 15, 2022 at 23:15

1 Comment

The author also wants a diff of the contents of the changes, not just which files changed.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.