Quantifiers in a regular expression used with awk behave unexpected

后端 未结 2 1872
陌清茗
陌清茗 2021-01-21 20:16

I want to process this list: (Of course this is just an excerpt.)

    1   S3 -> PC-8-Set
    2   S3 -> PC-850-Set
    3   S3 -> ANSI-Set
    4   S3 ->         


        
相关标签:
2条回答
  • 2021-01-21 20:26

    Try

    awk --re-interval -F '[[:blank:]]{1,}' '{printf ("%s ", $2)}' list
    

    --re-interval

    Allow interval expressions (see Regexp Operators) in regexps. This is now gawk's default behavior. Nevertheless, this option remains both for backward compatibility, and for use in combination with the --traditional option.

    0 讨论(0)
  • 2021-01-21 20:31

    You are using a Gawk which is from before this November 2010 commit, found by git bisect.

    http://git.savannah.gnu.org/cgit/gawk.git/commit/?id=40b3741f63c19e38077d57f4ce4737916ec5073e

    The change indeed hinges on the defaulting behavior with respect to intervals, which become on by default (as POSIX requires them to be).

    It looks like the --re-interval option becomes relegated only for use with --traditional; i.e. that if --traditional is enabled, then support for {m,n} goes away, but can be selectively brought back with --re-interval.

    In your version, {m,n} is unrecognized by default, with or without --traditional. This is true up to this commit:

    commit 00ef0423acd97cb964a2bae54c93a03a8ab50e5e
    Author: Arnold D. Robbins <arnold@******>
    Date:   Fri Jul 16 14:55:10 2010 +0300
    
        Move to 3.1.8.
    

    and you're behind that still, on 3.1.5.

    0 讨论(0)
提交回复
热议问题