Exclude elements from vector based on regular expression pattern

后端 未结 1 327
你的背包
你的背包 2021-01-12 22:49

I have some data which I want to clean up using a regular expression in R.

It is easy to find how to get elements that contain certain patterns, or do not contain ce

1条回答
  •  逝去的感伤
    2021-01-12 23:42

    Edit: From the comments, and with a little testing, one would find that my suggestion wasn't correct.

    Here are two correct solutions:

    vector[!grepl("['pyfgcrl]", vector)]                    ## kohske
    grep("['pyfgcrl]", vector, value = TRUE, invert = TRUE) ## flodel
    

    If either of them wants to re-post and accept credit for their answer, I'm more than happy to delete mine here.


    Explanation

    The general function that you are looking for is grepl. From the help file for grepl:

    grepl returns a logical vector (match or not for each element of x).

    Additionally, you should read the help page for regex which describes what character classes are. In this case, you create a character class ['pyfgcrl], which says to look for any character in the square brackets. You can then negate this with !.

    So, up to this point, we have something that looks like:

    !grepl("['pyfgcrl]", vector)
    

    To get what you are looking for, you subset as usual.

    vector[!grepl("['pyfgcrl]", vector)]
    

    For the second solution, offered by @flodel, grep by default returns the position where a match is made, and the value = TRUE argument lets you return the actual string value instead. invert = TRUE means to return the values that were not matched.

    0 讨论(0)
提交回复
热议问题