r grep by regex - finding a string that contains a sub string exactly one once

前端 未结 6 1889
暗喜
暗喜 2021-01-25 22:10

I am using R in Ubuntu, and trying to go over list of files, some of them i need and some of them i don\'t need,

I try to get the one\'s i need by finding a sub string

6条回答
  •  无人共我
    2021-01-25 22:28

    Detecting strings with a but not aa

    You can use the following TRE regex:

    ^[^a]*a[^a]*$
    

    It matches the start of the string (^), 0+ chars other than a ([^a]*), an a, again 0+ non-'a's and the end of string ($). See this IDEONE demo:

    a <- c("aca","cac","a", "abab", "ab-ab", "ab-cc-ab")
    grep("^[^a]*a[^a]*$", a, value=TRUE)
    ## => [1] "cac" "a"
    

    Finding Whole Word Containing a but not aa

    If you need to match words that have one a only, but not two or more as inside in any location.

    Use this PCRE regex:

    \b(?!\w*a\w*a)\w*a\w*\b
    

    See this regex demo.

    Explanation:

    • \b - word boundary
    • (?!\w*a\w*a) - a negative lookahead failing the match if there are 0+ word chars, a, 0+ word chars and a again right after the word boundary
    • \w* - 0+ word chars
    • a - an a
    • \w* - 0+ word chars
    • \b - trailing word boundary.

    NOTE: Since \w matches letters, digits and underscores, you might want to change it to \p{L} or [^\W\d_] (only matches letters).

    See this demo:

    a <- c("aca","cac","a")
    grep("\\b(?!\\w*a\\w*a)\\w*a\\w*\\b", a, perl=TRUE, value=TRUE)
    ## => [1] "cac" "a"  
    

提交回复
热议问题