Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Git clone hangs with large repo #1142

Answered by Byron
Abbyyan asked this question in Q&A
Discussion options

I've used Gitpython to clone some repos but it hangs with a specified repo with size of 17G. I've create a Pool to do git clone using Gitpython. There is a large git repo and needs more time than others to clone. Each process do a clone work for one repo. The Pool i used as follows:

 multi_res = [p.apply_async(runfunc, args=(
 incl_info, project_root, skip_dirs,)) for incl_info in incl_infos]
 LogInfo('Waiting for all subprocesses done...')
 for i in range(len(incl_infos)):
 while not multi_res[i].ready():
 LogInfo("Downloading now")
 time.sleep(5)
 p.close()
 p.join()

It works perfectly in most case. But will often hangs in the largest repo. It's wired that when i just clone the repo individually, It works fine. So i wonder if there is some block in python multiprocessing Pool at first.

I've strace the hanged git clone process . The git process output as follows:

Process 27649 attached
read(6, 0x7ffc36dae050, 4) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL, si_value={int=2895997, ptr=0x2c307d}} ---
rt_sigreturn() = 0
read(6, 0x7ffc36dae050, 4) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL, si_value={int=2895997, ptr=0x2c307d}} ---
rt_sigreturn() = 0
read(6, 0x7ffc36dae050, 4) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL, si_value={int=2895997, ptr=0x2c307d}} ---
rt_sigreturn() = 0
read(6, 0x7ffc36dae050, 4) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL, si_value={int=2895997, ptr=0x2c307d}} ---
rt_sigreturn() 

The git-lfs output as follows:

Process 28006 attached
[ Process PID=28006 runs in 32 bit mode. ]
futex(0x88b982c, FUTEX_WAIT_PRIVATE, 0, NULL

But when i replace the git.repo.clone_from with shell script git clone in a new subprocess, it works fine. So maybe there are some block in git.repo.clone_from, and i wonder whether it's solved. Thanks a lot.

You must be logged in to vote

Thanks for the detailed investigation! If memory serves, the way GitPython handles progress reporting on long-running clones can be prone to hanging. Even though it was thought to be fixed, apparently there is still a chance of it failing.

The workaround proposed here is certainly preferred over using GitPython at all, since it's doing what's needed much more directly. GitPython in the end just spawns a git process itself and fails to properly handle it's output on long-running process.

I am closing this issue as I don't think the underlying cause can clearly be determined or fixed, and due to the presence of a viable workaround. Please feel free to keep commenting here in case you would ...

Replies: 3 comments

Comment options

Thanks for the detailed investigation! If memory serves, the way GitPython handles progress reporting on long-running clones can be prone to hanging. Even though it was thought to be fixed, apparently there is still a chance of it failing.

The workaround proposed here is certainly preferred over using GitPython at all, since it's doing what's needed much more directly. GitPython in the end just spawns a git process itself and fails to properly handle it's output on long-running process.

I am closing this issue as I don't think the underlying cause can clearly be determined or fixed, and due to the presence of a viable workaround. Please feel free to keep commenting here in case you would propose a different way of handling this - your opinion would be greatly appreciated.

You must be logged in to vote
0 replies
Answer selected by Byron
Comment options

Thanks a lot. I enter the same problem today using checkout function and I wonder if there is a timeout when Gitpython start a new subprocess?

You must be logged in to vote
0 replies
Comment options

Unfortunately no, GitPython is inherently synchronous and blocking. Everyone is invited to have a look at the code, communicating to a subprocess should not ever hang :/.

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants
Converted from issue

This discussion was converted from issue #969 on February 26, 2021 11:18.

AltStyle によって変換されたページ (->オリジナル) /