2

I am creating a tool for analysis of a Git repository, but have stumbled at something that should (seemingly) be quite simple.

I want to create a mapping of commits to diffs (i.e. actual blob changes line by line for the given commit); I have tried using GitPython but haven't had any success. I would like to achieve something like this:

def get_all_commits(REPO_URL):
 chromium_repo = Repo(REPO_URL)
 commits = list(chromium_repo.iter_commits())
 commmit_diffs = {}
 for commit in commits:
 diff = # get all blob changes for commit
 commit_diffs[commit.hexsha] = diff
 return commit_diffs

but am not sure how to get all blob changes for a given commit. commit_diffs would be in the form:

{ 232d8f39bedc0fb64d15eed4f46d6202c75066b6 : '<String detailing all blob changes for given commit>' }

Any help would be great.

asked Aug 29, 2017 at 17:59
4
  • 2
    It would be helpful if you provided some sample code with output and describe how the output should be different. It looks there is a way to get diff information, but perhaps it doesn't provide what you need? Commented Aug 29, 2017 at 18:12
  • @LexScarisbrick This is the first time I've used this library so I'm really not sure where to start; I've added a code sample to give some sort of idea Commented Aug 29, 2017 at 18:50
  • Be aware that this code will probably consume all your RAM as the chromium repo is quite big. That being said, have you tried commit.diff()? Commented Aug 29, 2017 at 18:52
  • @NilsWerner Yes I'm aware of memory issues; is a one time computation which will be stored in a db. Yes, although is a diff object not just a hash of the entire tree? How would I find differences between both trees efficiently? As you mentioned, the chromium repo is huge; having to compare the entire tree for every commit seems pretty ludicrous, unless there is something I'm missing Commented Aug 29, 2017 at 18:57

1 Answer 1

1

I was unaware of the git diff <commit_a> <commit_b> command. The following (I think!) solves the issue:

def get_all_commits(REPO_URL):
 repo = Repo(REPO_URL)
 commits = list(repo.iter_commits())
 commmit_diffs = {}
 for index, commit in enumerate(commits):
 next_index = index + 1
 if next_index < len(commits):
 commit_diffs[commit.hexsha] = repo.git.diff(commits[next_index], commit)
 return commit_diffs
answered Aug 29, 2017 at 19:19
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.