We\'re pretty happy with SVN right now, but Joel\'s tutorial intrigued me. So I was wondering - would it be feasible in our situation too?
The thing is - our SVN reposit
From my experience, Mercurial is pretty good at handling a large number of files and a huge history. The drawback is that you shouldn't check-in files bigger than 10 Mb. We used Mercurial to keep an history of our compiled DLL. It's not recommend to put binaries in a source countrol but we tried it anyway (it was a repository dedicated to the binaries). The repository was about 2 Gig and we are not too sure that we will be able to keep doing that in the future. Anyway, for source code I don't think you need to worry.
No, does not work. You dont want anything that requires signiciant storage on client side then. If you get that large (by toring fo rexample images etc.), the storage requires more than a normal workstation has anyway to be efficient.
You better go with something centralized then. Simple math - it simlpy is not feasible to have tond of gb on every workstation AND be efficient there. It simply makes no sense.
Distributed version control for HUGE projects - is it feasible?
Absolutely! As you know, Linux is massive and uses Git. Mercurial is used for some major projects too, such as Python, Mozilla, OpenSolaris and Java.
We're pretty happy with SVN right now, but Joel's tutorial intrigued me. So I was wondering - would it be feasible in our situation too?
Yes. And if you're happy with Subversion now, you're probably not doing much branching and merging!
The thing is - our SVN repository is HUGE. [...] There are over 68,000 revisions (changesets), the source itself takes up over 100MB
As others have pointed out, that's actually not so big compared to many existing projects.
The problem then is simple - a clone of the whole repository would probably take ages to make, and would consume far more space on the drive that is remotely sane.
Both Git and Mercurial are very efficient at managing the storage, and their repositories take up far less space than the equivalent Subversion repo (having converted a few). And once you have an initial checkout, you're only pushing deltas around, which is very fast. They are both significantly faster in most operations. The initial clone is a one-time cost, so it doesn't really matter how long it takes (and I bet you'd be surprised!).
And since the very point of distributed version control is to have a as many repositories as needed, I'm starting to get doubts.
Disk space is cheap. Developer productivity matters far more. So what if the repo takes up 1GB? If you can work smarter, it's worth it.
How does Mercurial (or any other distributed version control) deal with this? Or are they unusable for such huge projects?
It is probably worth reading up on how projects using Mercurial such as Mozilla managed the conversion process. Most of these have multiple repos, which each contain major components. Mercurial and Git both have support for nested repositories too. And there are tools to manage the conversion process - Mercurial has built-in support for importing from most other systems.
Added: To clarify - the whole thing is one monolithic beast of a project which compiles to a single .EXE and cannot be split up.
That makes it easier, as you only need the one repository.
Added 2: Second thought - The Linux kernel repository uses git and is probably an order of magnitude or two bigger than mine. So how do they make it work?
Git is designed for raw speed. The on-disk format, the wire protocol, the in-memory algorithms are all heavily optimized. And they have developed sophisticated workflows, where patches flow from individual developers, up to subsystem maintainers, up to lieutenants, and eventually up to Linus. One of the best things about DVCS is that they are so flexible they enable all sorts of workflows.
I suggest you read the excellent book on Mercurial by Bryan O'Sullivan, which will get you up to speed fast. Download Mercurial and work through the examples, and play with it in some scratch repos to get a feel for it.
Then fire up the convert
command to import your existing source repository. Then try making some local changes, commits, branches, view logs, use the built-in web server, and so on. Then clone it to another box and push around some changes. Time the most common operations, and see how it compares. You can do a complete evaluation at no cost but some of your time.
You say you're happy with SVN... so why change?
As far as distributed version control systems go, Linux uses git and Sun use Mercurial. Both are impressively large source code repositories, and they work just fine. Yes, you end up with all revisions on all workstations, but that's the price you pay for decentralisation. Remember storage is cheap - my development laptop currently has 1TB (2x500GB) of hard disk storage on board. Have you tested pulling your SVN repo into something like Git or Mercurial to actually see how much space it would take?
My question would be - are you ready as an organisation to go decentralised? For a software shop it usually makes much more sense to keep a central repository (regular backups, hook ups to CruiseControl or FishEye, easier to control and administer).
And if you just want something faster or more scalable than SVN, then just buy a commercial product - I've used both Perforce and Rational ClearCase and they scale up to huge projects without any problems.