Version control for large binary files and >1TB repositories?

后端 未结 10 1040
逝去的感伤
逝去的感伤 2021-02-01 03:58

Sorry to come up with this topic again, as there are soo many other questions already related - but none that covers my problem directly.

What I\'m searching is a good v

相关标签:
10条回答
  • 2021-02-01 04:59

    You might be much better off by simply relying on some NAS device that would provide a combination of filesystem-accessible snapshots together with single instance store / block level deduplication, given the scale of data you are describing ...

    (The question also mentions .cab & .msi files: usually the CI software of your choice has some method of archiving builds. Is that what you are ultimately after?)

    0 讨论(0)
  • 2021-02-01 05:04

    Take a look at Boar, "Simple version control and backup for photos, videos and other binary files". It can easily handle huge files and huge repositories.

    0 讨论(0)
  • 2021-02-01 05:04

    Update May 2017:

    Git, with the addition of GVFS (Git Virtual File System), can support virtually any number of files of any size (starting with the Windows repository itself: "The largest Git repo on the planet" (3.5M files, 320GB).
    This is not yet >1TB, but it can scale there.

    The work done with GVFS is slowly proposed upstream (that is to Git itself), but that is still a work in progress.
    GVFS is implement on Windows, but will soon be done for Mac (because the team at Windows developing Office for Mac demands it), and Linux.


    April 2015

    Git can actually be considered as a viable VCS for large data, with Git Large File Storage (LFS) (by GitHub, april 2015).

    git-lfs (see git-lfs.github.com) can be tested with a server supporting it: lfs-test-server (or directly with github.com itself):
    You can store metadata only in the git repo, and the large file elsewhere.

    https://cloud.githubusercontent.com/assets/1319791/7051226/c4570828-ddf4-11e4-87eb-8fc165e5ece4.gif

    0 讨论(0)
  • 2021-02-01 05:04

    When you really have to use a VCS, i would use svn, since svn does not require to copy the entire repository to the working copy. But it still needs about the duplicate amount of disk space, since it has a clean copy for each file.

    With these amount of data I would look for a document management system, or (low level) use a read-only network share with a defined input process.

    0 讨论(0)
提交回复
热议问题