Can regex match all the words outside quotation marks?

前端 未结 4 1545
执念已碎
执念已碎 2021-01-06 21:25

I recently typed an essay for my lit class, and my teacher specifically stated a word limit that does not include quotations from the piece. And I thought, why not make a sc

4条回答
  •  离开以前
    2021-01-06 22:00

    This is easy enough using PCRE (or Perl of course):

    ".*?"(*SKIP)(?!)|(?

    Use the g modifier, and s if you want to handle multiline quotes.

    Demo

    Here's the x version for readability:

      ".*?"              (*SKIP)(?!)
    | (?

    The first part will match everything inside " or ' quotes and will discard it ((*SKIP)(?!)). The second part will match all words (I've included ' as being part of a word in this example). The ' character will be counted as a quote boundary only at start/end of words, to let you use things like isn't for instance.

    Possible modifications:

    • To count the text isn't as two words, replace [\w']+ with \w+.
    • To count text like mother-in-law as one word instead of 3, replace [\w']+ with [-\w']+.

    You get the point ;)

    And here's a full Perl script that uses this regex:

    #!/usr/bin/env perl
    use strict;
    use warnings;
    
    $_ = do { local $/; <> };
    print scalar (() = /".*?"(*SKIP)(?!)|(?

    Execute it passing in a file or STDIN containing the text you want to count the words in, and it will output the word count on STDOUT.

提交回复
热议问题