I am using git to manage a C++ project. When I am working on the projects, I find it hard to organize the changes into commits when changing things that are related to many plac
What I am asking is the PHILOSOPHY part.
I think I can answer this because I have been involved in some personal research recently.
One should focus in creating an atomic commit. Which means that it's necessary to take some extra care in a few things for a commit:
Commits should be focused in one change, and one change only. Anything more than that can have bad side-effects.
Some people might argue that this is too much, that it is not practical. But the best argument in favor of it, even for small companies, is the fact that bulding atomic commits will force your design to be more decoupled and consistent, because one requirement to achieve full optimal atomic commits is to have a healthy codebase that is not a mess.
If you force good commit practices consistently, you will be able to drive the engineering culture and the code itself to a better state.
I tend to commit as you propose: a commit is a logically connected change set. My commits can be anything from a one-liner to a change in all files (for example add/change a copyright notice in the source files). The reason for change need not be a full task that I am implementing, but it is usually a milestone in the task.
If I have modified something that is not related to my current commit, I tend to do an interactive add to separate out the unrelated changes, too - even when it is a whitespace tidy up.
I have found that commits that simply dump the working state to repository makes them a lot less useful: I cannot backport a bugfix to an earlier version or include a utility functionality in another branch easily if the commits are all over the place.
One alternative to this approach is using a lot of tiny commits inside a feature branch, and once the whole feature is done, do heavy history rewriting to tidy up the commits into a logical structure. But I find this approach to be a time waster.
This is exactly the use case, for which the index, the staging area, was introduced in git.
You can feel free to do as many changes unrelated to each other as possible. Then you choose what all are related and then make several atomic commits in one shot.
I do it all the time. If you use git-gui
or any of the other GUI clients, you can choose not only the file that you want to commit, but also hunks within the files
, so your commits are as atomic as possible.
Disclaimer: I too am in the process of trying to work out what commits should be, and how the final history should end up looking. However, I wanted to share some of the resources that I've come across during my own research.
First off, the Linux Kernel project has a great page on Merge Strategies for getting your code merged upstream. They talk about making bite-sized commits; doing one or more refactoring commits before the actual additions you want (the refactorings are supposed to make your feature cleaner of course ;) and other things.
My other favorite page is Git Best Practices by Seth Robertson. This is not only a page on a lot of best practices for using git, but it also is a tremendous resource, containing enough information about a broad variety of git topics to make googling for more in-depth information trivial.
Something that very much helped me in working out what I was committing, and why, was moving our repository organisation over to the 'feature branch' model, as popularised by the Git Flow extension.
By having branches describing each feature (or update, bugfix etc) that is being worked on, commits become less about the feature and more about how you are going about implementing that feature. For example, I was recently fixing a timezone bug within its own bugfix branch (bugfixes/gh-87 for example), and the commits were split up into what was done or the server side and the front end, and within the tests. Because all of this was happening on a branch dedicated to that bug, (with a GitHub issue number too, for clarity and auto closing), my commits were seen as the incremental steps in solving that problem, and so required less explanation as to why I was doing them.
Sometimes when you do big refactoring, it's inevitable that you change many files in one commit. When you change interface of a class, you have to change the header, the implementation and all places that use the interface in one commit, because no intermediate state would work.
However, the recommended practice is to change the interface without actually introducing any new functionality first, test that you didn't break existing functionality and commit that. Than implement the actual feature that needed the updated interface and commit that separately. You will probably end up doing some adjustments to the refactoring in the process that you'll squash to the first commit using interactive rebase.
That way there is a big commit, but it does not do anything hard, just shuffles code around, so it should be mostly easy to understand even though it's big and than second commit (or more, if the feature is big) that is not too big.