How to remove old versions of media files from a git repository

£可爱£侵袭症+ 提交于 2019-11-28 23:05:08

I have a script (github gist here) to remove a selection of unwanted folders from the entire history of a git repo, or to delete all but the latest version of a folder.

It's hard-coded to assume that all git repositories are in ~/repos, but that's easy to change. It should also be easy to adapt to work with individual files.

lac.alan

Old thread but in case someone else stumbles along here…

GitHub & Bitbucket both recommend using BFG Repo-Cleaner.

See:
GitHub: Remove Sensitive Data
Bitbucket: Reduce Repository Size & Bitbucket: Maintaining a Git Repository

Example to remove files over 1 Megabyte, as well as jpgs, pngs and mp3s that are not in HEAD:

# First get the latest bfg.jar, then:
$ git clone --mirror git://example.com/some-big-repo.git
$ java -jar bfg.jar --strip-blobs-bigger-than 1M --delete-files '*.{jpg,png,mp3}' some-big-repo.git
$ cd some-big-repo.git
$ git reflog expire --expire=now --all && git gc --prune=now --aggressive
$ git push

Note: now you've pushed the updated revs, the remote repository should also run it's git gc …else you won't see the size reduction. (see e.g. https://stackoverflow.com/a/28782154/3419541)

Finally, re-clone the repository to be sure that you don't accidentally re-commit the old media file blobs.

sateesh

Check the section on 'Removing Objects' in the chapter Maintenance and Data Recovery in the ProGit book. It provides steps about how to go about removing objects from the git repo. But be warned though that it is destructive.

sml

As mentioned already, you will be re-writing history here, so you will have to get collaborators (if any) to do git rebase.

As for stripping a particular file from history, Github has a nice walkthrough.

For a solution going forward, you should look at putting the binary files in a sub-module.

Git's submodule support allows a repository to contain, as a subdirectory, a checkout of an external project. Submodules maintain their own identity; the submodule support just stores the submodule repository location and commit ID, so other developers who clone the containing project ("superproject") can easily clone all the submodules at the same revision. Partial checkouts of the superproject are possible: you can tell Git to clone none, some or all of the submodules.

https://git-scm.com/docs/git-submodule

https://git-scm.com/book/en/v2/Git-Tools-Submodules

As far as I know, this can't be done, because in git, every commit depends on the contents of the entire history up to that point. So the only way to get rid of the old, big files would be to "replay" the entire commit history (preferrably with the same commit timestamps and authors), omitting the big files. Note that this will produce an entirely separate commit history.

This is obviously not a very viable approach, so the lesson is probably "don't use git to version huge binary files". Instead, you could perhaps have a separate (ignored) folder for the files and use a separate system to version control them.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!