问题
I want to do this:
findstr /s /c:some-symbol *
or the grep equivalent
grep -R some-symbol *
but I need the utility to autodetect files encoded in UTF-16 (and friends) and search them appropriately. My files even have the byte-ordering mark FFEE in them so I'm not even looking for heroic autodetection.
Any suggestions?
I'm referring to Windows Vista and XP.
回答1:
Thanks for the suggestions. I was referring to Windows Vista and XP.
I also discovered this workaround, using free Sysinternals strings.exe:
C:\> strings -s -b dir_tree_to_search | grep regexp
Strings.exe
extracts all of the strings it finds (from binaries, but works fine with text files too) and prepends each result with a filename and colon, so take that into account in the regexp (or use cut or another step in the pipeline). The -s
makes it do a recursive extraction and -b
just suppresses the banner message.
Ultimately I'm still kind of surprised that the flagship searching utilities Gnu grep
and findstr
don't handle Unicode character encodings natively.
回答2:
On Windows, you can also use find.exe.
find /i /n "YourSearchString" *.*
The only problem is this prints file names followed by matches. You may filter them by piping to findstr
find /i /n "YourSearchString" *.* | findstr /i "YourSearchString"
回答3:
A workaround is to convert your UTF-16 to ASCII or ANSI
TYPE UTF-16.txt > ASCII.txt
Then you can use FINDSTR.
FINDSTR object ASCII.txt
回答4:
findstr /s /c:some-symbol *
can be replaced with the following character encoding aware command:
for /r %f in (*) do @find /i /n "some-symbol" "%f"
回答5:
According to this blog article by Damon Cortesi grep doesn't work with UTF-16 files, as you found out. However, it presents this work-around:
for f in `find . -type f | xargs -I {} file {} | grep UTF-16 | cut -f1 -d\:`
do iconv -f UTF-16 -t UTF-8 $f | grep -iH --label=$f ${GREP_FOR}
done
This is obviously for Unix, not sure what the equivalent on Windows would be. The author of that article also provides a shell-script to do the above that you can find on github here.
This only greps files that are UTF-16. You'd also grep your ASCII files the normal way.
回答6:
In higher versions of Windows, UTF-16 is supported out-of-box. If not, try changing active code page by chcp
command.
In my case when using findstr
alone was failing for UTF-16 files, however it worked with type
:
type *.* | findstr /s /c:some-symbol
回答7:
You didn't say which platform you want to do this on.
On Windows, you could use PowerGREP, which automatically detects Unicode files that start with a byte order mark. (There's also an option to auto-detect files without a BOM. The auto-detection is very reliable for UTF-8, but limited for UTF-16.)
来源:https://stackoverflow.com/questions/408079/findstr-or-grep-that-autodetects-chararacter-encoding-utf-16