Codeberg/Community
54
325
Fork
You've already forked Community
12

Git hooks for large commits #418

Open
opened 2021年03月18日 15:04:32 +01:00 by fnetX · 8 comments
Owner
Copy link

Follow-up of #414 and an ongoing discussion on how to keep Codeberg clean and repo sizes low. We considered using git hooks to check for users (accidently) uploading large files in their commits that is not intended for the public. Removing this from the repo keeps it in the git history and thus it counts to the repo size (on Codeberg and on every contributors machine).

We would need a way to filter out

  • accidental commits of large binary files, especially compiled software
  • inclusion of build artefacts and caching (which is much harder to spot since this are small files, maybe some filtering for the names?)

There was a consideration to use git hooks for this job to check content while a user is pushing stuff to the repo. It's possible to echo messages back. Still, there must be a way to tell the server that uploading this stuff is indeed intended.

We are looking for help for the technical implementations to check the commits on receive, echo an informing message to the user and have a way to override the blocking in case the upload is intended. Any volunteers?

Note in comment
Follow-up of #414 and an ongoing discussion on how to keep Codeberg clean and repo sizes low. We considered using git hooks to check for users (accidently) uploading large files in their commits that is not intended for the public. Removing this from the repo keeps it in the git history and thus it counts to the repo size (on Codeberg and on every contributors machine). We would need a way to filter out - accidental commits of large binary files, especially compiled software - inclusion of build artefacts and caching (which is much harder to spot since this are small files, maybe some filtering for the names?) There was a consideration to use git hooks for this job to check content while a user is pushing stuff to the repo. It's possible to echo messages back. Still, there must be a way to tell the server that uploading this stuff is indeed intended. We are looking for help for the technical implementations to check the commits on receive, echo an informing message to the user and have a way to override the blocking in case the upload is intended. Any volunteers? ###### Note in comment <!-- Something like "*** WARNING *** you are about to commit a verx large file (X MB). In all common workflows this is likely unintentional and will trigger a review by the Codeberg team, unless justified for good reason possibly lead to removal or takedown. If this is intentional and required please describe reason in commit message and add the string INTENTIONAL_BIG_COMMIT in first line of your commit message. Codeberg.org is funded by volunters and contributors world-wide, please consider donating and supporting our cause!" hw in Matrix -->
Member
Copy link

There was a consideration to use git hooks ...

another option maybe cron jobs filing tickets via API into the project's issue tracker?

> There was a consideration to use git hooks ... another option maybe cron jobs filing tickets via API into the project's issue tracker?
Author
Owner
Copy link

Well, that is what we discussed in #414 ... but it won't prevent people from accidental uploads. Once it's in the git history, it cannot be removed without annoying everyone else.

Well, that is what we discussed in #414 ... but it won't **prevent** people from accidental uploads. Once it's in the git history, it cannot be removed without annoying everyone else.
Member
Copy link

Good point

Good point
Author
Owner
Copy link

Transfer from Matrix-Chat:
https://github.com/git-lfs/git-lfs/issues/282#issuecomment-296438497 links https://github.com/Ninjaccount/git-big-lfs-hook, a commit hook that might be a good starting point.

Ideally rewritten to check and inform about acceptable repo size and that git LFS is a nice thing ... but does allow the commit on second pass (or interactively or whatever ...)

Transfer from Matrix-Chat: https://github.com/git-lfs/git-lfs/issues/282#issuecomment-296438497 links https://github.com/Ninjaccount/git-big-lfs-hook, a commit hook that might be a good starting point. Ideally rewritten to *check* and *inform* about acceptable repo size and that git LFS is a nice thing ... but does allow the commit on second pass (or interactively or whatever ...)

I would say: let them pass if commit message contain a sentence like I like to avoid lfs because ... ;)

I would say: let them pass if commit message contain a sentence like `I like to avoid lfs because ...` ;)
Author
Owner
Copy link

My main motivation to not strictly enforce this: What if you are collaborating on a project across instances and including the commits of others. You want to push some comment from upstream or your work from years ago to Codeberg - and ooops ... YOU SHALL NOT PASS 😂

Rewriting the history for big projects is of course an option, but not using Codeberg is probably the easier one, then.

That's why I'd go with proper information instead of enforcing 😉

My main motivation to not strictly enforce this: What if you are collaborating on a project across instances and including the commits of others. You want to push some comment from upstream or your work from years ago to Codeberg - and ooops ... ***YOU SHALL NOT PASS*** 😂 Rewriting the history for big projects is of course an option, but not using Codeberg is probably the easier one, then. That's why I'd go with proper information instead of enforcing 😉
Author
Owner
Copy link

PS: It might be nice to use Git push options for this job. E.g. allow users to specifiy -o disableSizeCheck.

And, have some common gitignore values to compare. I noticed quite some repos with .idea folders, and heard of many Firefox profiles committed to GitHub. It might also be sensible to react like

Hey, your commit includes a folder called ".idea" which is commonly used for IntelliJ projects and usually not necessary with your code. If you really want to include this file, please add -o disableContentCheck to your push command.

Hey, your commit includes binaries with the ending ".so", ".exe", ... which are commonly distributed as compiled results of your work and usually not necessary with your code. If you really want to include this file, please add -o disableContentCheck to your push command.

PS: It might be nice to use [Git push options](https://git-scm.com/docs/git-push#Documentation/git-push.txt--oltoptiongt) for this job. E.g. allow users to specifiy `-o disableSizeCheck`. And, have some common gitignore values to compare. I noticed quite some repos with `.idea` folders, and heard of many Firefox profiles committed to GitHub. It might also be sensible to react like > Hey, your commit includes a folder called ".idea" which is commonly used for IntelliJ projects and usually not necessary with your code. If you really want to include this file, please add `-o disableContentCheck` to your push command. > Hey, your commit includes binaries with the ending ".so", ".exe", ... which are commonly distributed as compiled results of your work and usually not necessary with your code. If you really want to include this file, please add `-o disableContentCheck` to your push command.
Author
Owner
Copy link

I ripped out the hooks on codeberg-test.org and replaced them.

find /data/git/gitea-repositories/ -type d -name hooks | xargs rm -r

Everything is now in /data/git/hooks as a Proof of Concept. I'll have a look if this is useful for Gitea, too.
Currently, in Gitea, there is one hooks folder per repo, which uses unnecessary disk space (a lot of duplication if not compressed on FS level, also with the Git templates (sample files)), and which must be updated if the Gitea installation changes for example.

Having them in one place has some benefits, I'm considering if I only change this for Codeberg (easiest) or globally (we can still check for per-repo hooks, but not use them by default). Also, this needs some testing, but I think it works fine now. If you want to confirm everything is working, feel free to heavily test Codeberg.org pushes :)

I ripped out the hooks on codeberg-test.org and replaced them. ~~~ find /data/git/gitea-repositories/ -type d -name hooks | xargs rm -r ~~~ Everything is now in `/data/git/hooks` as a Proof of Concept. I'll have a look if this is useful for Gitea, too. Currently, in Gitea, there is one hooks folder per repo, which uses unnecessary disk space (a lot of duplication if not compressed on FS level, also with the Git templates (sample files)), and which must be updated if the Gitea installation changes for example. Having them in one place has some benefits, I'm considering if I only change this for Codeberg (easiest) or globally (we can still check for per-repo hooks, but not use them by default). Also, this needs some testing, but I think it works fine now. If you want to confirm everything is working, feel free to heavily test Codeberg.org pushes :)
Sign in to join this conversation.
No Branch/Tag specified
main
No results found.
Labels
Clear labels
accessibility

Reduces accessibility and is thus a "bug" for certain user groups on Codeberg.
bug

Something is not working the way it should. Does not concern outages.
bug
infrastructure

Errors evidently caused by infrastructure malfunctions or outages
Codeberg

This issue involves Codeberg's downstream modifications and settings and/or Codeberg's structures.
contributions welcome

Please join the discussion and consider contributing a PR!
docs

No bug, but an improvement to the docs or UI description will help
duplicate

This issue or pull request already exists
enhancement

New feature
infrastructure

Involves changes to the server setups, use `bug/infrastructure` for infrastructure-related user errors.
legal

An issue directly involving legal compliance
licence / ToS

involving questions about the ToS, especially licencing compliance
please chill
we are volunteers

Please consider editing your posts and remember that there is a human on the other side. We get that you are frustrated, but it's harder for us to help you this way.
public relations

Things related to Codeberg's external communication
question

More information is needed
question
user support

This issue contains a clearly stated problem. However, it is not clear whether we have to fix anything on Codeberg's end, but we're helping them fix it and/or find the cause.
s/Forgejo

Related to Forgejo. Please also check Forgejo's issue tracker.
s/Forgejo/migration

Migration related issues in Forgejo
s/Pages

Issues related to the Codeberg Pages feature
s/Weblate

Issue is related to the Weblate instance at https://translate.codeberg.org
s/Woodpecker

Woodpecker CI related issue
security

involves improvements to the sites security
service

Add a new service to the Codeberg ecosystem (instead of implementing into Gitea)
upstream

An open issue or pull request to an upstream repository to fix this issue (partially or completely) exists (i.e. Gitea, Forgejo, etc.)
wontfix

Codeberg's current set of contributors are not planning to spend time on delegating this issue.
Milestone
Clear milestone
No items
No milestone
Projects
Clear projects
No items
No project
Assignees
Clear assignees
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
Codeberg/Community#418
Reference in a new issue
Codeberg/Community
No description provided.
Delete branch "%!s()"

Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?