I want to do this:
findstr /s /c:some-symbol *
or the grep equivalent
grep -R some-symbol *
but I need
Thanks for the suggestions. I was referring to Windows Vista and XP.
I also discovered this workaround, using free Sysinternals strings.exe:
C:\> strings -s -b dir_tree_to_search | grep regexp
Strings.exe
extracts all of the strings it finds (from binaries, but works fine with text files too) and prepends each result with a filename and colon, so take that into account in the regexp (or use cut or another step in the pipeline). The -s
makes it do a recursive extraction and -b
just suppresses the banner message.
Ultimately I'm still kind of surprised that the flagship searching utilities Gnu grep
and findstr
don't handle Unicode character encodings natively.
You didn't say which platform you want to do this on.
On Windows, you could use PowerGREP, which automatically detects Unicode files that start with a byte order mark. (There's also an option to auto-detect files without a BOM. The auto-detection is very reliable for UTF-8, but limited for UTF-16.)
findstr /s /c:some-symbol *
can be replaced with the following character encoding aware command:
for /r %f in (*) do @find /i /n "some-symbol" "%f"
According to this blog article by Damon Cortesi grep doesn't work with UTF-16 files, as you found out. However, it presents this work-around:
for f in `find . -type f | xargs -I {} file {} | grep UTF-16 | cut -f1 -d\:`
do iconv -f UTF-16 -t UTF-8 $f | grep -iH --label=$f ${GREP_FOR}
done
This is obviously for Unix, not sure what the equivalent on Windows would be. The author of that article also provides a shell-script to do the above that you can find on github here.
This only greps files that are UTF-16. You'd also grep your ASCII files the normal way.
In higher versions of Windows, UTF-16 is supported out-of-box. If not, try changing active code page by chcp
command.
In my case when using findstr
alone was failing for UTF-16 files, however it worked with type
:
type *.* | findstr /s /c:some-symbol
On Windows, you can also use find.exe.
find /i /n "YourSearchString" *.*
The only problem is this prints file names followed by matches. You may filter them by piping to findstr
find /i /n "YourSearchString" *.* | findstr /i "YourSearchString"