I know 1000s of similar topics floating around. I read at lest 5 threads here in SO But why am I still not convinced about DVCS?
I have only following questions (not
I'm a Mercurial developer and have worked as a Mercurial consultant. So I find your questions very interesting and hope I answer them:
- What is the advantage or value of committing locally? [...]
You are correct that IDEs can track local changes beyond simple undo/redo these days. However, there is still a gap in functionality between these file snapshots and a full version control system.
The local commits give you the option of preparing your "story" locally before you submit it for review. I often work on some changes involving 2-5 commits. After I make commit 4, I might go back and amend commit 2 slightly (maybe I saw an error in commit 2 after I made commit 4). That way I'll be working not just on the latest code, but on the last couple of commits. That's trivially possible when everything is local, but it becomes more tricky if you need to sync with a central server.
- what if I crash my hard drive? [...] so how is it cool compared to checking in to a central repo?
Not cool at all! :-)
However, even with a central repo, you still have to worry about the uncommited data in the working copy. I would therefore claim that you ought to have a backup solution in place anyway.
It is my experience, that people often have larger chunks of uncommited data lying around in their working copies with a centralized system. Clients told me how they were trying to convince developers to commit at least once a week.
The changes are often left uncommited because:
They are not really finished. There might be debug print statements in the code, there might be incomplete functions, etc.
Committing would go into trunk
and that is dangerous with a centralized system since it impacts everybody else.
Committing would require you to first merge with the central repository. That merge might be intimidating if you know that there has been other conflicting changes made to the code. The merge might simply be annoying because you might not be all done with the changes and you prefer to work from a known-good state.
Committing can be slow when you have to talk to an overloaded central server. If you're in an offshore location, commits are even slower.
You are absolute correct if you think that the above isn't really a question of centralized versus distribted version control. With a CVCS, people can work in separate branches and thus trivially avoid 2 and 3 above. With a separate throw-away branch, I can also commit as much as I want since I can create another branch where I commit more polished changes (solving 1). Commits can still be slow, though, so 4 can apply still.
People who use DVCS will often push their "local" commits to a remote server anyway as poor man's backup solution. They don't push to the main server where the rest of the team is working, but to another (possibly private) server. That way they can work in isolation and still keep off-site backups.
- Working offline or in an air plane. [...]
Yeah, I never liked that argument either. I have good Internet connectivity 99% of the time and don't fly enough for this to be an issue :-)
However, the real argument is not that you are offline, but that you can pretend to be offline. More precisely, that you can work in isolation without having to send your changes to a central repository immediately.
DVCS tools are designed around the idea that people might be working offline. This has a number of important consequences:
Merging branches become a natural thing. When people can work in parallel, forks will naturally occur in the commit graph. These tools must therefore be really good at merging branches. A tool such a SVN is not very good at merging!
Git, Mercurial, and other DVCS tools merge better because they have had more testing in this area, not directly because they are distributed.
More flexibility. With a DVCS, you have the freedom to push/pull changes between arbitrary repositories. I'll often push/pull between my home and work computers, without using any real central server. When things are ready for publication, I push them to a place like Bitbucket.
Multi-site sync is no longer an "enterprise feature", it's a built-in feature. So if you have an off-shore location, they can setup a local hub repository and use this among themselves. You can then sync the local hubs hours, daily, or when it suits you. This requires nothing more than a cronjob that runs hg pull
or git fetch
at regular intervals.
Better scalability since more logic is on the client-side. This means less maintenance on the central server, and more powerful client-side tools.
With a DVCS, I expect to be able to do a keyword search through revisions of the code (not just the commit messages). With a centralized tool, you normally need to setup an extra indexing tool.
Your central argument about the IDE doing the tracking for you is false. Most IDEs don't in fact have any such functionality besides unlimited undo levels. Think of branches, merges, reverts, commit messages (log) and such and I bet that even the IDE that you did refer to falls short. Especially I doubt it tracking your commits - quite possibly on several different branches that you work on - and properly pushing them to the repository once you get online.
If your IDE actually does all that, I would in fact call it a distributed version control system in itself.
Finally, if the central repository dies for whatever the reason (your service provider went bankrupt, there was a fire, a hacker corrupted it, ...), you have a full backup on every machine that had pulled the repository recently.
EDIT: You can use a DVCS just like a centralized repository, and I would even recommend doing so for small-to-medium sized projects at least. Having one central "authoritative" repository that is always online simplifies a lot of things. And when that machine crashes, you can temporarily switch to one of the other machines until the server gets fixed.
If your harddisk silently starts corrupting data, you damn well want to know about it. Git takes SHA1 hashes of everything you commit. You have 1 central repo with SVN and if its bits get silently modified by a faulty HDD controller you won't know about it till it's too late.
And since you have 1 central repo, you just blew your only lifeline.
With git, everyone has an identical repo, complete with change history, and its content can be fully trusted due to SHA1's of its complete image. So if you back up your 20 byte SHA1 of your HEAD you can be certain that when you clone from some untrusted mirror, you have the exact same repo you lost!
When you use a centralised repo, all the branches are there for the world to see. You can't make private branches. You have to make some branch that doesn't already collide with some other global name.
"
test123
-- damn, there's already atest123
. Lets trytest124
."
And everyone has to see all these branches with stupid names. You have to succumb to company policy that might go along the lines of "don't make branches unless you really need to", which prevents a lot of freedoms you get with git.
Same with committing. When you commit, you better be really sure your code works. Otherwise you break the build. No intermediate commits. 'Cause they all go to the central repo.
With git you have none of this nonsense. Branch and commit locally all you want. When you're ready to expose your changes to the rest of the world, you ask them to pull from you, or you push it to some "main" git repo.
Since your repo is local, all the VCS operations are fast and don't require round trips and transfer from the central server! git log
doesn't have to go over the network to find a change history. SVN does. Same with all other commands, since all the important stuff is stored in one location!
Watch Linus' talk for these and other benefits over SVN.
If you don't see the value of local history or local builds, then I'm not sure than any amount of question-answering is going to change your mind.
The history features of IDE's are limited and clumsy. They are nothing like the full function.
One good example of how this stuff gets used is on various Apache projects. I can sync up a git repo to the Apache svn repo. Then I can work for a week in a private branch all my very own. I can downmerge changes from the repo. I can report on my changes, retail or wholesale. And when I'm done, I can package them up as one commit.