I would like to find all file paths that are not filtered by a .gitignore (or any nested .gitignore files within sub-directories) using C#. This is similar to the q
It's difficult to make suggestions without knowing exactly what you want to do with the list (use it in a build script, process the files in some way, just view them on a UI, etc.)
I couldn't find one in C#, but this JavaScript gitignore parser doesn't have a lot of code to convert and it exposes both an accepts
and a denies
method to get a list of included or ignored files. It is fairly well documented, has tests, and the regular expressions it uses would work just as well in C# as they do in JavaScript.
This answer would work from C#, provided you have Git installed on the machine where your C# code is running.
Also note that the Git Source Control Provider plugin for Visual Studio provides the list right in the IDE, along with the ability to check boxes and commit certain files together and a lot of other functionality that is difficult to do on the command line.
NOTE: The Git Source Control Provider is open source (written in C#) and you can view the source here, but it may be much more involved to reverse engineer than the JavaScript project.
Well, the best way to parse .gitignore
files (and the other files Git uses, such as $GIT_DIR/info/exclude
) is to get Git to do it for you. :-) (In your case, most cases in fact, this does involve executing a git subprocess.)
git check-ignore
The git check-ignore command can be used to detect which files are ignored and why. The --non-matching
option makes it tell you about files that are not ignored as well, though since it still tells you about ignored files, too, and in a special format, you'll need to do a little bit of further work to get a simple list of non-ignored files. This Bourne shell function does the trick:
find_nonignored() {
find . -path ./.git -prune -o -print \
| git check-ignore --verbose --non-matching --stdin \
| sed -n -e 's,\t./,\t,' -e 's,^::\t*,,p' \
}
The find
command finds all files in and below the current working directory, which should be somewhere in the tree you're trying to filter. We exclude the top-level .git
subdirectory and everything under it from the output, if present; /.git/
is not in a typical .gitignore
file because Git ignores it automatically and thus is is normally considered "not ignored" by git check-ignore
.
git check-ignore
will print out --non-matching
files only in --verbose
mode because it's only in that mode where it prints out the extra information that would tell you if the file is ignored or not. (It always prints ignored files.) The paths come out one per line in the format
source:linenum:pattern<TAB>path
The colon-separated fields are information about what caused the path to be ignored (such as a line in the .gitignore
file) and will be empty if the file is not ignored.
The sed
command then filters the output to show only the paths of the ignored files. The -n
option tells it not to print out the input lines by default. The first substitution pattern replaces <TAB>./
with just <TAB>
, removing the leading ./
, for purely aesthetic reasons. The second substitution does the real work, removing any ::<TAB>
(indicating no "ignore" information) that starts a line and, if that substitution happened, printing what's left of the line which is a non-ignored path.
You can filter this further to do additional processing; I built this for a script that does markdown checking along these lines:
markdownlint $(find_nonignored | grep '\.md$')
This code includes untracked files (i.e., have never been added to the Git repo or staged) in the output, which is usually what you want. (Test systems, for example, should still check new files even before they've had git add
run on them.) Beware that other solutions involving git ls-files
and the like usually don't do this.
The above code relies on using GNU sed
, which interprets \t
as a tab. If you're using BSD sed
(such as on MacOS) you probably need to tweak this slightly. Check the comments to see if someone has a hint for this.
All the code here breaks on paths with spaces or other "unusual" characters; it needs to be modified in several places (such as using -print0
with find
) to fix this. I do not address issues like this here in order to keep the explanation simple. I also leave for others the generalization of the function to work on arbitrary paths rather than just the current working directory.