I have a small scripting project that consists of five different source files in one directory called "Droid XX-XX-XX". Each time I created a new backup copy of the source directory, I put the date in the X's. So there are about 15 different versions from different dates. I want to add each of these to my bare new Git repository starting from the earliest.
However I have run into several problems.
One problem is that some of the files use tabs for indentation, while others use spaces -- but Git treats a whole line as different even when the only difference is the tab vs. space issue. How can I make Git ignore indentation formatting?
Another problem is that some filenames would have no spaces while others had spaces between the words -- but Git treats them as different files. Worse, sometimes the filename was changed to something different (like "PatrolPlan" changed to just "Patrol") for no real reason. When I'm adding a new set of files, how can I tell Git that even though the filename is different, it's really just a new version of a certain older file? Or better yet, can I set it to auto-detect when this happens?
The last problem is that at certain points during development, we merged two source files into one, or split one into two -- but Git doesn't automatically detect the similarities and deduce what happened. How can I tell Git what happened? Or better yet, how can I set it to auto-detect when two source files were combined or when one was split up?
I realize questions (2) and (3) are highly related. Thanks for any assistance!
It's sounding like you need more control and standardization of the development process. The one who commits changes should be the same person who modifies the files. Or at least the committer should know exactly what changed.
Examine carefully the output of git diff
, and use the -w
flag to ignore spaces. There's also options to show differences within a line. See Diffs within a line below.
Note that you won't be able to tell git to skip the space changes when committing. I suggest using GitX (I prefer the "brotherbard" fork), which allows you to interactively discard hunks before committing.
Use descriptive messages when committing. For example, if a file was split, say so. Make your commits small. If you find yourself writing long commit messages, break up the commit into smaller parts. That way when you examine the logs a long time later, it will make more sense what changed.
Diffs within a line
Git has some ability to show "word" differences in a single line. The simplest way is to just use git diff --color-words
.
However, I like customizing the meaning of a "word" using the diff.wordRegex
config. I also like the plain
word-diff format because it more clearly shows where the differences are (inserts brackets around the changes in addition to using color).
Command:
git diff --word-diff=plain
along with this in my config:
[diff]
wordRegex = [[:alnum:]_]+|[^[:alnum:]_[:space:]]+
This regex treats these as "words":
- consecutive strings of alphanumerics and underscores
- consecutive strings of non-alphanumerics, non-underscores, and non-spaces (good for detecting operators)
You must have a recent version of git
to use wordRegex
. See your git-config
man page to see if the option is listed.
UPDATE
If you use git mv
to rename a file (which is preferable to using another tool or the OS to rename), you can see git detecting the rename. I highly recommend committing a rename independently of any edits to the contents of the file. That's because git doesn't actually store the fact that you renamed - it uses a heuristic based on how much the file has changed to guess whether it was the same file. The less you change it during the rename-commit, the better.
If you did change the file contents slightly, you can use -C
param to git diff
and git log
to try harder to detect copies and renames. Add a percentage (e.g. -C75%
) to make git more lenient about differences. The percent represents how similar the contents have to be to be considered a match.
Now that I know a lot more about Git, I can answer my own questions.
It would be better to do a global search-replace using regex to standardize the whitespace between all the files across the different versions of the project, so that when they are sequentially committed, the whitespaces changes won't need commits. That being said, Atlassian SourceTree's diff tool allows you to hide whitespace changes, so at least you won't see those.
The key to deal with filename changes is to make a commit where only the file's name changes (don't stage any other changes). Then make a commit where its contents change. That way, normal diff tools that don't do a ton of heuristics and deep digging can make sense out of what has happened. The problem is that if too much changes about a file, like the name AND a lot of the contents, then most diff tools will treat it as a summary deletion and new file. (as mentioned in the correct answer)
This is a tougher one, there's no really good way around it. If you split up a file into two, or merge two, it will just be ugly in the diff. Try not to make lots of changes at the same time as doing the split, so that the split will be one thing, and subsequent changes will be another.
You won't be able to make git ignore tabs/spaces as git creates a hash of each file and if the hash is different the file is considered different.
Git treats trees (directories) the same as files; if their content changes then they are different tree's.
I don't think these changes are anything to worry about however; they happen during any development. I think the best approach for you is to replay your development using git. In other words start with your initial version and then make the necessary changes (as you did originally) and git will remember what you are doing.
Optional: If you want to record the date/time of the changes to be roughly those originally made, then you can use the --date
command line option to git commit
to tell git when these changes were made.
来源:https://stackoverflow.com/questions/12427779/how-do-you-make-git-ignore-spaces-and-tabs