Does git lfs reduce the size of files pushed to Github?

问题

Github does not allow to push files larger than 100 MB. Using git lfs, it is possible, to push large files to Github. I am just curious about the idea of the process: To me it seems, that git lfs is just an additional switch which enables the push of large files (via https:/ only) to Github. But I can't image, that's all?

The doumentation in altlassian states

Git LFS (Large File Storage) is a Git extension developed by Atlassian, GitHub, and a few other open source contributors, that reduces the impact of large files in your repository by downloading the relevant versions of them lazily. Specifically, large files are downloaded during the checkout process rather than during cloning or fetching. Git LFS does this by replacing large files in your repository with tiny pointer files. During normal usage, you'll never see these pointer files as they are handled automatically by Git LFS.

Some details: I have a small project which I cannot push to github because of say one large file. I can then migrate and push as follows:

git lfs migrate import --everything --include="*.pdf"
git reflog expire --expire-unreachable=now --all
git gc --prune=now
git push origin master
git lfs checkout (? If you have local files with 1 kB only? Happend some days later...)

and everthing is pushed to Github - even the large files. Thus, why does Github deny large files, if it is allowed using git lfs (which can be installed quickly and works easily)?

回答1:

The problem isn't with large files per se, but the way that Git stores them. Git stores files and sends files over the network using deltification and compression. Deltification stores a file with less data by making reference to another file and storing only the differences.

When the server side repacks the stored data, Git will also verify that the data is still intact by running git fsck. This means that every file must be decompressed, de-deltified, and processed into memory at least partially. For large files, this causes a huge amount of CPU and memory to be used, which impacts other repositories stored on the server. Files may also be re-deltified, which means that that file and other files must be read entirely into memory, compared against other files at some cost, and then rewritten and re-compressed. The alternative is to simply store those files without deltification and only compress them, but this leads to out-of-control disk usage, especially for files which don't compress well.

On the client side, a user must download the entire repository on a clone. This leads to using a large amount of bandwidth to clone large files, which are often uncompressable, and means that a user must store all of this content locally, even if they're only interested in a few revisions.

Git LFS does away with all the storage in the Git repository by using a separate HTTP-based protocol and allowing the objects to be uploaded to a separate location that isn't part of the main Git repository. This means that the costs Git imposes for compression and deltification are avoided, and users can download only the files they need for their current checkout. This means that server load and bandwidth are both greatly reduced, as are client storage needs.

来源：https://stackoverflow.com/questions/57922231/does-git-lfs-reduce-the-size-of-files-pushed-to-github

标签

git

git-lfs