I hope this is okay, to open an issue as a community discussion. I don't know where else to go. I tried on Mastodon, but it's not working there. Please let me know!
Why git-lfs is an "illusion"
I have been building game-dev tools for a while; this usually means big blender files alongside small code files. I found git-lfs (like many do) and thought, "cool, it's got it sorted!" I did not look into the details, I just believed the pitch: it handles large files cleverly so that you don't have to worry.
But... I am fairly sure that git-lfs is not saving space; neither for me nor Codeberg.
I first noticed this when pushing a small repo that had a 7mb blend file. I was making changes to it, add, commit, and push, repeat. By chance I was also watching the Codeberg repo page and noticed my repo's size going up and up. 7mb -> 14mb -> 21mb -> eek!
So, after some asking around on Mastodon and a bunch of surfing and reading, I learned that git-lfs is saving versions of my blend files on the server-side, where I thought it was just saving one file. I also found that locally in .git/lfs there are just as many versions! So it's chewing local space too!
I have found that git lfs prune run locally, will clean out the local copies quite well. Is there an equivalent that Codeberg can run?
(Git-lfs' actual use-case seems to be for those who clone your repo to get the smallest possible download of binary blobs.)
Please let me know if I'm wrong about lfs; I'd love to be wrong!
Let's Rewind: The Situation
Doing game-dev, it's impossible to avoid big blobby files. 3D assets, sounds, textures, videos, etc.
These files raise basic questions:
- How can I backup these huge files online, without blowing-up storage space?
- How can I work with them in a natural and easy way using git?
The most basic approach I can think of is a structure like:
📁 project
📁 bigfiles
char.blend
texture.png
📁 dev
.git
.gitattributes
.gitignore
somecode.gd
char.blend (symlink into ../bigfiles/char.blend)
texture.png (symlink into ../bigfiles/texture.png)
You work in dev as normal with git. When you add new large files, do so in bigfiles and make a relative links in dev as needed.
How to backup the files in bigfiles is the main question. The use-case is to get those big files saved on the Codeberg server, using minimum space (no duplicates because of git versioning).
Pushing
rsync (+ ssh)
If Codeberg could offer rsync space, then this would be a solved problem! I honestly think they would use less drive space than LFS is currently using.
We could write some scripts, perhaps to extend git, perhaps just a script one runs from the bigfiles dir when you want to send changed files up.
rsync is pretty magical and it can even do binary diffs etc.
Pulling/Cloning
rsync (+ ssh)
Since the project is now two directories, the git part comes down with a normal clone. This exposes all the broken symlinks. I don't know yet how to best do it, but the README can explain how to pull down the bigfiles dir. Perhaps a simple script, perhaps a git extension again.
Your Ideas?
What do you think about the general problem and these ideas? What else could/would you do?
File locking is probably going to be needed, but I am talking about a me (a small dev, like, honestly, solo) and if others joined, it would be up to us to communicate so we don't stomp files.
Other Ideas
- Git-submodules I tried to understand this, but it's pretty obscure. I also think the same problem happens: If all the binary blobs are in a submodule rep, that repo is still going to grow by the size of each blob times every change.
Unless.. Is there a way to keep squashing history, or whatever git voodoo, so that the sub-repo gets garbage-collected often and thus keeps the storage space down?
- Git-subtrees I could not make head or tail of these. Are they an option?
- Git-LFS Is there a way to use it that results in actual minimal file versioning? i.e. I only want the latest of each of the files in
bigfiles to be saved on Codeberg and my local repo.
- Git-Annex Seems it needs a server-side component so I could not test it out.
- A recent idea from Mastodon: Codeberg "Releases". Have to still look at it. https://docs.codeberg.org/git/using-tags/
What else?
.
### Comment
I hope this is okay, to open an issue as a community discussion. I don't know where else to go. I tried on Mastodon, but it's not working there. Please let me know!
# Why git-lfs is an "illusion"
I have been building game-dev tools for a while; this usually means big blender files alongside small code files. I found git-lfs (like many do) and thought, "cool, it's got it sorted!" I did not look into the details, I just believed the pitch: it handles large files cleverly so that you don't have to worry.
But... I am fairly sure that git-lfs is not saving space; neither for me nor Codeberg.
I first noticed this when pushing a small repo that had a 7mb blend file. I was making changes to it, add, commit, and push, repeat. By chance I was also watching the Codeberg repo page and noticed my repo's size going up and up. 7mb -> 14mb -> 21mb -> eek!
So, after some asking around on Mastodon and a bunch of surfing and reading, I learned that git-lfs *is* saving versions of my blend files on the server-side, where I thought it was just saving one file. I also found that *locally* in .git/lfs there are just as many versions! So it's chewing local space too!
I have found that `git lfs prune` run locally, will clean out the local copies quite well. Is there an equivalent that Codeberg can run?
(Git-lfs' actual use-case seems to be for those who `clone` your repo to get the smallest possible download of binary blobs.)
Please let me know if I'm wrong about lfs; I'd love to be wrong!
# Let's Rewind: The Situation
Doing game-dev, it's impossible to avoid big blobby files. 3D assets, sounds, textures, videos, etc.
These files raise basic questions:
1. How can I backup these huge files online, without blowing-up storage space?
2. How can I work with them in a natural and easy way using git?
The most basic approach I can think of is a structure like:
```
📁 project
📁 bigfiles
char.blend
texture.png
📁 dev
.git
.gitattributes
.gitignore
somecode.gd
char.blend (symlink into ../bigfiles/char.blend)
texture.png (symlink into ../bigfiles/texture.png)
```
You work in `dev` as normal with git. When you add new large files, do so in `bigfiles` and make a relative links in `dev` as needed.
How to backup the files in `bigfiles` is the main question. The use-case is to get those big files saved on the Codeberg server, using minimum space (no duplicates because of git versioning).
# Pushing
## rsync (+ ssh)
If Codeberg could offer rsync space, then this would be a solved problem! I honestly think they would use less drive space than LFS is currently using.
We could write some scripts, perhaps to extend git, perhaps just a script one runs from the `bigfiles` dir when you want to send changed files up.
rsync is pretty magical and it can even do binary diffs etc.
# Pulling/Cloning
## rsync (+ ssh)
Since the project is now two directories, the git part comes down with a normal clone. This exposes all the broken symlinks. I don't know yet how to best do it, but the README can explain how to pull down the `bigfiles` dir. Perhaps a simple script, perhaps a git extension again.
# Your Ideas?
What do you think about the general problem and these ideas? What else could/would you do?
File locking is probably going to be needed, but I am talking about a me (a small dev, like, honestly, solo) and if others joined, it would be up to us to communicate so we don't stomp files.
# Other Ideas
- **Git-submodules** I tried to understand this, but it's pretty obscure. I also think the same problem happens: If all the binary blobs are in a submodule rep, that repo is still going to grow by the size of each blob times every change.
Unless.. Is there a way to keep squashing history, or whatever git voodoo, so that the sub-repo gets garbage-collected often and thus keeps the storage space down?
- **Git-subtrees** I could not make head or tail of these. Are they an option?
- **Git-LFS** Is there a way to use it that results in actual minimal file versioning? i.e. I only want the latest of each of the files in `bigfiles` to be saved on Codeberg *and* my local repo.
- **Git-Annex** Seems it needs a server-side component so I could not test it out.
- A recent idea from Mastodon: Codeberg "Releases". Have to still look at it. https://docs.codeberg.org/git/using-tags/
What else?
.