发表新帖

发表新帖

r grep by regex - finding a string that contains a sub string exactly one once

前端未结

关注

 6  1889

暗喜 2021-01-25 22:10

I am using R in Ubuntu, and trying to go over list of files, some of them i need and some of them i don\'t need,

I try to get the one\'s i need by finding a sub string

6条回答

无人共我 (楼主)

2021-01-25 22:28
Detecting strings with a but not aa

You can use the following TRE regex:
```
^[^a]*a[^a]*$
```
It matches the start of the string (^), 0+ chars other than a ([^a]*), an a, again 0+ non-'a's and the end of string ($). See this IDEONE demo:
```
a <- c("aca","cac","a", "abab", "ab-ab", "ab-cc-ab")
grep("^[^a]*a[^a]*$", a, value=TRUE)
## => [1] "cac" "a"
```
Finding Whole Word Containing a but not aa

If you need to match words that have one a only, but not two or more as inside in any location.

Use this PCRE regex:
```
\b(?!\w*a\w*a)\w*a\w*\b
```
See this regex demo.

Explanation:
- \b - word boundary
- (?!\w*a\w*a) - a negative lookahead failing the match if there are 0+ word chars, a, 0+ word chars and a again right after the word boundary
- \w* - 0+ word chars
- a - an a
- \w* - 0+ word chars
- \b - trailing word boundary.
NOTE: Since \w matches letters, digits and underscores, you might want to change it to \p{L} or [^\W\d_] (only matches letters).

See this demo:
```
a <- c("aca","cac","a")
grep("\\b(?!\\w*a\\w*a)\\w*a\\w*\\b", a, perl=TRUE, value=TRUE)
## => [1] "cac" "a"  
```
0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...

热议问题