The Windows FINDSTR command is horribly documented. There is very basic command line help available through FINDSTR /?
, or HELP FINDSTR
, but it is
Answer continued from part 1 above - I've run into the 30,000 character answer limit :-(
Limited Regular Expressions (regex) Support
FINDSTR support for regular expressions is extremely limited. If it is not in the HELP documentation, it is not supported.
Beyond that, the regex expressions that are supported are implemented in a completely non-standard manner, such that results can be different then would be expected coming from something like grep or perl.
Regex Line Position anchors ^ and $
^
matches beginning of input stream as well as any position immediately following a <LF>. Since FINDSTR also breaks lines after <LF>, a simple regex of "^" will always match all lines within a file, even a binary file.
$
matches any position immediately preceding a <CR>. This means that a regex search string containing $
will never match any lines within a Unix style text file, nor will it match the last line of a Windows text file if it is missing the EOL marker of <CR><LF>.
Note - As previously discussed, piped and redirected input to FINDSTR may have <CR><LF>
appended that is not in the source. Obviously this can impact a regex search that uses $
.
Any search string with characters before ^
or after $
will always fail to find a match.
Positional Options /B /E /X
The positional options work the same as ^
and $
, except they also work for literal search strings.
/B functions the same as ^
at the start of a regex search string.
/E functions the same as $
at the end of a regex search string.
/X functions the same as having both ^
at the beginning and $
at the end of a regex search string.
Regex word boundary
\<
must be the very first term in the regex. The regex will not match anything if any other characters precede it. \<
corresponds to either the very beginning of the input, the beginning of a line (the position immediately following a <LF>), or the position immediately following any "non-word" character. The next character need not be a "word" character.
\>
must be the very last term in the regex. The regex will not match anything if any other characters follow it. \>
corresponds to either the end of input, the position immediately prior to a <CR>, or the position immediately preceding any "non-word" character. The preceding character need not be a "word" character.
Here is a complete list of "non-word" characters, represented as the decimal byte code. Note - this list was compiled on a U.S machine. I do not know what impact other languages may have on this list.
001 028 063 179 204 230
002 029 064 180 205 231
003 030 091 181 206 232
004 031 092 182 207 233
005 032 093 183 208 234
006 033 094 184 209 235
007 034 096 185 210 236
008 035 123 186 211 237
009 036 124 187 212 238
011 037 125 188 213 239
012 038 126 189 214 240
014 039 127 190 215 241
015 040 155 191 216 242
016 041 156 192 217 243
017 042 157 193 218 244
018 043 158 194 219 245
019 044 168 195 220 246
020 045 169 196 221 247
021 046 170 197 222 248
022 047 173 198 223 249
023 058 174 199 224 250
024 059 175 200 226 251
025 060 176 201 227 254
026 061 177 202 228 255
027 062 178 203 229
Regex character class ranges [x-y]
Character class ranges do not work as expected. See this question: Why does findstr not handle case properly (in some circumstances)?, along with this answer: https://stackoverflow.com/a/8767815/1012053.
The problem is FINDSTR does not collate the characters by their byte code value (commonly thought of as the ASCII code, but ASCII is only defined from 0x00 - 0x7F). Most regex implementations would treat [A-Z] as all upper case English capital letters. But FINDSTR uses a collation sequence that roughly corresponds to how SORT works. So [A-Z] includes the complete English alphabet, both upper and lower case (except for "a"), as well as non-English alpha characters with diacriticals.
Below is a complete list of all characters supported by FINDSTR, sorted in the collation sequence used by FINDSTR to establish regex character class ranges. The characters are represented as their decimal byte code value. I believe the collation sequence makes the most sense if the characters are viewed using code page 437. Note - this list was compiled on a U.S machine. I do not know what impact other languages may have on this list.
001
002
003
004
005
006
007
008
014
015
016
017
018
019
020
021
022
023
024
025
026
027
028
029
030
031
127
039
045
032
255
009
010
011
012
013
033
034
035
036
037
038
040
041
042
044
046
047
058
059
063
064
091
092
093
094
095
096
123
124
125
126
173
168
155
156
157
158
043
249
060
061
062
241
174
175
246
251
239
247
240
243
242
169
244
245
254
196
205
179
186
218
213
214
201
191
184
183
187
192
212
211
200
217
190
189
188
195
198
199
204
180
181
182
185
194
209
210
203
193
207
208
202
197
216
215
206
223
220
221
222
219
176
177
178
170
248
230
250
048
172
171
049
050
253
051
052
053
054
055
056
057
236
097
065
166
160
133
131
132
142
134
143
145
146
098
066
099
067
135
128
100
068
101
069
130
144
138
136
137
102
070
159
103
071
104
072
105
073
161
141
140
139
106
074
107
075
108
076
109
077
110
252
078
164
165
111
079
167
162
149
147
148
153
112
080
113
081
114
082
115
083
225
116
084
117
085
163
151
150
129
154
118
086
119
087
120
088
121
089
152
122
090
224
226
235
238
233
227
229
228
231
237
232
234
Regex character class term limit and BUG
Not only is FINDSTR limited to a maximum of 15 character class terms within a regex, it fails to properly handle an attempt to exceed the limit. Using 16 or more character class terms results in an interactive Windows pop up stating "Find String (QGREP) Utility has encountered a problem and needs to close. We are sorry for the inconvenience." The message text varies slightly depending on the Windows version. Here is one example of a FINDSTR that will fail:
echo 01234567890123456|findstr [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]
This bug was reported by DosTips user Judago here. It has been confirmed on XP, Vista, and Windows 7.
Regex searches fail (and may hang indefinitely) if they include byte code 0xFF (decimal 255)
Any regex search that includes byte code 0xFF (decimal 255) will fail. It fails if byte code 0xFF is included directly, or if it is implicitly included within a character class range. Remember that FINDSTR character class ranges do not collate characters based on the byte code value. Character <0xFF>
appears relatively early in the collation sequence between the <space>
and <tab>
characters. So any character class range that includes both <space>
and <tab>
will fail.
The exact behavior changes slightly depending on the Windows version. Windows 7 hangs indefinitely if 0xFF is included. XP doesn't hang, but it always fails to find a match, and occasionally prints the following error message - "The process tried to write to a nonexistent pipe."
I no longer have access to a Vista machine, so I haven't been able to test on Vista.
Regex bug: .
and [^anySet]
can match End-Of-File
The regex .
meta-character should only match any character other than <CR>
or <LF>
. There is a bug that allows it to match the End-Of-File if the last line in the file is not terminated by <CR>
or <LF>
. However, the .
will not match an empty file.
For example, a file named "test.txt" containing a single line of x
, without terminating <CR>
or <LF>
, will match the following:
findstr /r x......... test.txt
This bug has been confirmed on XP and Win7.
The same seems to be true for negative character sets. Something like [^abc]
will match End-Of-File. Positive character sets like [abc]
seem to work fine. I have only tested this on Win7.
When several commands are enclosed in parentheses and there are redirected files to the whole block:
< input.txt (
command1
command2
. . .
) > output.txt
... then the files remains open as long as the commands in the block be active, so the commands may move the file pointer of the redirected files. Both MORE and FIND commands move the Stdin file pointer to the beginning of the file before process it, so the same file may be processed several times inside the block. For example, this code:
more < input.txt > output.txt
more < input.txt >> output.txt
... produce the same result than this one:
< input.txt (
more
more
) > output.txt
This code:
find "search string" < input.txt > matchedLines.txt
find /V "search string" < input.txt > unmatchedLines.txt
... produce the same result than this one:
< input.txt (
find "search string" > matchedLines.txt
find /V "search string" > unmatchedLines.txt
)
FINDSTR is different; it does not move the Stdin file pointer from its current position. For example, this code insert a new line after a search line:
call :ProcessFile < input.txt
goto :EOF
:ProcessFile
rem Read the next line from Stdin and copy it
set /P line=
echo %line%
rem Test if it is the search line
if "%line%" neq "search line" goto ProcessFile
rem Insert the new line at this point
echo New line
rem And copy the rest of lines
findstr "^"
exit /B
We may make good use of this feature with the aid of an auxiliary program that allow us to move the file pointer of a redirected file, as shown in this example.
This behavior was first reported by jeb at this post.
EDIT 2018-08-18: New FINDSTR bug reported
The FINDSTR command have a strange bug that happen when this command is used to show characters in color AND the output of such a command is redirected to CON device. For details on how use FINDSTR command to show text in color, see this topic.
When the output of this form of FINDSTR command is redirected to CON, something strange happens after the text is output in the desired color: all the text after it is output as "invisible" characters, although a more precise description is that the text is output as black text over black background. The original text will appear if you use COLOR command to reset the foreground and background colors of the entire screen. However, when the text is "invisible" we could execute a SET /P command, so all characters entered will not appear on the screen. This behavior may be used to enter passwords.
@echo off
setlocal
set /P "=_" < NUL > "Enter password"
findstr /A:1E /V "^$" "Enter password" NUL > CON
del "Enter password"
set /P "password="
cls
color 07
echo The password read is: "%password%"