Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Launches git cat-file processes which never die #1209

Answered by Byron
ericfrederich asked this question in Q&A
Discussion options

I'm using this package to clone many repos. It's part of a utility which clones all repos under a GitLab group.
While running it I noticed over 6k git processes (and growing). See output below.
They're all git cat-file some with --batch-check and some with --batch.
Doing a search of this repo for batch_check I found these lines:

 cmd = self._get_persistent_cmd("cat_file_header", "cat_file", batch_check=True)
 # and ...
 cmd = self._get_persistent_cmd("cat_file_all", "cat_file", batch=True)

I guess emphasis on persistent right?
Are these processes meant to never die? What purpose do they serve?

Gather, count, and look at the procs

[ec2-user@ip-10-10-10-10 ~]$ ps -eaf | grep git > ~/git_procs
[ec2-user@ip-10-10-10-10 ~]$ wc -l ~/git_procs 
6086 /home/ec2-user/git_procs
[ec2-user@ip-10-10-10-10 ~]$ head ~/git_procs && tail ~/git_procs 
ec2-user 306 21895 0 23:22 pts/1 00:00:00 git cat-file --batch-check
ec2-user 309 21895 0 21:50 pts/1 00:00:00 git cat-file --batch-check
ec2-user 310 21895 0 22:46 pts/1 00:00:00 git cat-file --batch-check
ec2-user 312 21895 0 21:50 pts/1 00:00:00 git cat-file --batch
ec2-user 321 21895 0 23:22 pts/1 00:00:00 git cat-file --batch-check
ec2-user 323 21895 0 21:50 pts/1 00:00:00 git cat-file --batch-check
ec2-user 325 21895 0 22:46 pts/1 00:00:00 git cat-file --batch-check
ec2-user 326 21895 0 23:22 pts/1 00:00:00 git cat-file --batch
ec2-user 333 21895 0 21:50 pts/1 00:00:00 git cat-file --batch-check
ec2-user 335 21895 0 23:22 pts/1 00:00:00 git cat-file --batch-check
ec2-user 32731 21895 0 22:46 pts/1 00:00:00 git cat-file --batch-check
ec2-user 32732 21895 0 21:50 pts/1 00:00:00 git cat-file --batch-check
ec2-user 32735 21895 0 21:50 pts/1 00:00:00 git cat-file --batch
ec2-user 32741 21895 0 23:22 pts/1 00:00:00 git cat-file --batch-check
ec2-user 32746 21895 0 22:46 pts/1 00:00:00 git cat-file --batch-check
ec2-user 32748 21895 0 21:50 pts/1 00:00:00 git cat-file --batch-check
ec2-user 32750 21895 0 21:50 pts/1 00:00:00 git cat-file --batch
ec2-user 32759 21895 0 23:22 pts/1 00:00:00 git cat-file --batch-check
ec2-user 32762 21895 0 21:50 pts/1 00:00:00 git cat-file --batch-check
ec2-user 32764 21895 0 22:46 pts/1 00:00:00 git cat-file --batch-check
You must be logged in to vote

I think it's good to start off this discussion with a reference to known resource leakage and ways to fix it.

Something worth investigating here is if somehow the same repository is creating multiple git commands, leaving 'zombies' of previous invocation as process children or worse, detached from its parent process (python).

If you think that this is probably not the case then it's worth trying to explicitly calling the destructor on a Repo instance once you are done with it.

Replies: 2 comments 3 replies

Comment options

I think it's good to start off this discussion with a reference to known resource leakage and ways to fix it.

Something worth investigating here is if somehow the same repository is creating multiple git commands, leaving 'zombies' of previous invocation as process children or worse, detached from its parent process (python).

If you think that this is probably not the case then it's worth trying to explicitly calling the destructor on a Repo instance once you are done with it.

You must be logged in to vote
3 replies
Comment options

Thanks for the support. I confirmed that calling del() on each of the repo does in fact clean everything up.

It is an interesting choice to have persistent subprocesses. I'm guessing this was done for performance reasons... to re-use an existing process. I may dig into the code more if I find myself curious.

Fortunately for my use-case, while this process does run a long time it's not a daemon. It'll be ran periodically.

Thanks again for the quick reply. Next time I'll RTFReadme

Comment options

It is an interesting choice to have persistent subprocesses. I'm guessing this was done for performance reasons... to re-use an existing process. I may dig into the code more if I find myself curious.

That's true, these are used to read object headers and object data respectively. This also causes surprising behaviour if objects are cached though, so better don't do that but read one object at a time in GitPython.

Comment options

I noticed a repo acts as a context manager via __enter__ and __exit__. In fact, __exit__ calls self.close() same as the __del__() you recommended calling.

Question... is it safe to re-use existing repo objects more than once or should I create a new one each time?

Create and re-use just one Repo object

for path in paths:
 repo = git.Repo(some_path)
 
 with repo:
 repo.some_operation()
 
 # ... other code
 
 with repo:
 repo.another_operation()

Create Repo objects as needed

for path in paths:
 with git.Repo(some_path) as repo:
 repo.some_operation()
 
 # ... other code
 
 with git.Repo(some_path) as repo:
 repo.another_operation()
Answer selected by Byron
Comment options

Create and reuse should be the way to do it. Would you mind adding a note about the context manager support for repos to the readme resource leakage section? It seems like it would be helpful if it was more prominent.
...
Sent from my iPhone
On Mar 31, 2021, at 11:31 PM, Eric L. Frederich ***@***.***> wrote:  I noticed a repo acts as a context manager via __enter__ and __exit__. In fact, __exit__ calls self.close() same as the __del__() you recommended calling. Question... is it safe to re-use existing repo objects more than once or should I create a new one each time? Create and re-use just one Repo object for path in paths: repo = git.Repo(some_path) with repo: repo.some_operation() # ... other code with repo: repo.another_operation() Create Repo objects as needed for path in paths: with git.Repo(some_path) as repo: repo.some_operation() # ... other code with git.Repo(some_path) as repo: repo.another_operation() — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or unsubscribe.
You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
Converted from issue

This discussion was converted from issue #1208 on March 31, 2021 02:06.

AltStyle によって変換されたページ (->オリジナル) /