The Windows FINDSTR command is horribly documented. There is very basic command line help available through FINDSTR /?
, or HELP FINDSTR
, but it is
Answer continued from part 1 above - I've run into the 30,000 character answer limit :-(
Limited Regular Expressions (regex) Support
FINDSTR support for regular expressions is extremely limited. If it is not in the HELP documentation, it is not supported.
Beyond that, the regex expressions that are supported are implemented in a completely non-standard manner, such that results can be different then would be expected coming from something like grep or perl.
Regex Line Position anchors ^ and $
^
matches beginning of input stream as well as any position immediately following a
$
matches any position immediately preceding a <CR>. This means that a regex search string containing $
will never match any lines within a Unix style text file, nor will it match the last line of a Windows text file if it is missing the EOL marker of
Note - As previously discussed, piped and redirected input to FINDSTR may have
appended that is not in the source. Obviously this can impact a regex search that uses $
.
Any search string with characters before ^
or after $
will always fail to find a match.
Positional Options /B /E /X
The positional options work the same as ^
and $
, except they also work for literal search strings.
/B functions the same as ^
at the start of a regex search string.
/E functions the same as $
at the end of a regex search string.
/X functions the same as having both ^
at the beginning and $
at the end of a regex search string.
Regex word boundary
\<
must be the very first term in the regex. The regex will not match anything if any other characters precede it. \<
corresponds to either the very beginning of the input, the beginning of a line (the position immediately following a
\>
must be the very last term in the regex. The regex will not match anything if any other characters follow it. \>
corresponds to either the end of input, the position immediately prior to a
Here is a complete list of "non-word" characters, represented as the decimal byte code. Note - this list was compiled on a U.S machine. I do not know what impact other languages may have on this list.
001 028 063 179 204 230
002 029 064 180 205 231
003 030 091 181 206 232
004 031 092 182 207 233
005 032 093 183 208 234
006 033 094 184 209 235
007 034 096 185 210 236
008 035 123 186 211 237
009 036 124 187 212 238
011 037 125 188 213 239
012 038 126 189 214 240
014 039 127 190 215 241
015 040 155 191 216 242
016 041 156 192 217 243
017 042 157 193 218 244
018 043 158 194 219 245
019 044 168 195 220 246
020 045 169 196 221 247
021 046 170 197 222 248
022 047 173 198 223 249
023 058 174 199 224 250
024 059 175 200 226 251
025 060 176 201 227 254
026 061 177 202 228 255
027 062 178 203 229
Regex character class ranges [x-y]
Character class ranges do not work as expected. See this question: Why does findstr not handle case properly (in some circumstances)?, along with this answer: https://stackoverflow.com/a/8767815/1012053.
The problem is FINDSTR does not collate the characters by their byte code value (commonly thought of as the ASCII code, but ASCII is only defined from 0x00 - 0x7F). Most regex implementations would treat [A-Z] as all upper case English capital letters. But FINDSTR uses a collation sequence that roughly corresponds to how SORT works. So [A-Z] includes the complete English alphabet, both upper and lower case (except for "a"), as well as non-English alpha characters with diacriticals.
Below is a complete list of all characters supported by FINDSTR, sorted in the collation sequence used by FINDSTR to establish regex character class ranges. The characters are represented as their decimal byte code value. I believe the collation sequence makes the most sense if the characters are viewed using code page 437. Note - this list was compiled on a U.S machine. I do not know what impact other languages may have on this list.
001
002
003
004
005
006
007
008
014
015
016
017
018
019
020
021
022
023
024
025
026
027
028
029
030
031
127
039
045
032
255
009
010
011
012
013
033
034
035
036
037
038
040
041
042
044
046
047
058
059
063
064
091
092
093
094
095
096
123
124
125
126
173
168
155
156
157
158
043
249
060
061
062
241
174
175
246
251
239
247
240
243
242
169
244
245
254
196
205
179
186
218
213
214
201
191
184
183
187
192
212
211
200
217
190
189
188
195
198
199
204
180
181
182
185
194
209
210
203
193
207
208
202
197
216
215
206
223
220
221
222
219
176
177
178
170
248
230
250
048
172
171
049
050
253
051
052
053
054
055
056
057
236
097
065
166
160
133
131
132
142
134
143
145
146
098
066
099
067
135
128
100
068
101
069
130
144
138
136
137
102
070
159
103
071
104
072
105
073
161
141
140
139
106
074
107
075
108
076
109
077
110
252
078
164
165
111
079
167
162
149
147
148
153
112
080
113
081
114
082
115
083
225
116
084
117
085
163
151
150
129
154
118
086
119
087
120
088
121
089
152
122
090
224
226
235
238
233
227
229
228
231
237
232
234
Regex character class term limit and BUG
Not only is FINDSTR limited to a maximum of 15 character class terms within a regex, it fails to properly handle an attempt to exceed the limit. Using 16 or more character class terms results in an interactive Windows pop up stating "Find String (QGREP) Utility has encountered a problem and needs to close. We are sorry for the inconvenience." The message text varies slightly depending on the Windows version. Here is one example of a FINDSTR that will fail:
echo 01234567890123456|findstr [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]
This bug was reported by DosTips user Judago here. It has been confirmed on XP, Vista, and Windows 7.
Regex searches fail (and may hang indefinitely) if they include byte code 0xFF (decimal 255)
Any regex search that includes byte code 0xFF (decimal 255) will fail. It fails if byte code 0xFF is included directly, or if it is implicitly included within a character class range. Remember that FINDSTR character class ranges do not collate characters based on the byte code value. Character <0xFF>
appears relatively early in the collation sequence between the
and
characters. So any character class range that includes both
and
will fail.
The exact behavior changes slightly depending on the Windows version. Windows 7 hangs indefinitely if 0xFF is included. XP doesn't hang, but it always fails to find a match, and occasionally prints the following error message - "The process tried to write to a nonexistent pipe."
I no longer have access to a Vista machine, so I haven't been able to test on Vista.
Regex bug: .
and [^anySet]
can match End-Of-File
The regex .
meta-character should only match any character other than
or
. There is a bug that allows it to match the End-Of-File if the last line in the file is not terminated by
or
. However, the .
will not match an empty file.
For example, a file named "test.txt" containing a single line of x
, without terminating
or
, will match the following:
findstr /r x......... test.txt
This bug has been confirmed on XP and Win7.
The same seems to be true for negative character sets. Something like [^abc]
will match End-Of-File. Positive character sets like [abc]
seem to work fine. I have only tested this on Win7.