I recently typed an essay for my lit class, and my teacher specifically stated a word limit that does not include quotations from the piece. And I thought, why not make a sc
This is easy enough using PCRE (or Perl of course):
".*?"(*SKIP)(?!)|(?
Use the g
modifier, and s
if you want to handle multiline quotes.
Demo
Here's the x
version for readability:
".*?" (*SKIP)(?!)
| (?
The first part will match everything inside "
or '
quotes and will discard it ((*SKIP)(?!)
). The second part will match all words (I've included '
as being part of a word in this example). The '
character will be counted as a quote boundary only at start/end of words, to let you use things like isn't for instance.
Possible modifications:
[\w']+
with \w+
. [\w']+
with [-\w']+
.You get the point ;)
And here's a full Perl script that uses this regex:
#!/usr/bin/env perl
use strict;
use warnings;
$_ = do { local $/; <> };
print scalar (() = /".*?"(*SKIP)(?!)|(?
Execute it passing in a file or STDIN containing the text you want to count the words in, and it will output the word count on STDOUT.