问题
I know this question has been asked at several places, but I didnt see a precise answer to this.
So I am trying to extract exactly the 2nd word from a string("trying to") in R with the help of regex. I do not want to use unlist(strsplit)
sen= "I am trying to substring here something, but I am not able to"
str_extract(sen, "trying to\\W*\\s+((?:\\S+\\s*){2})")
Ideally I want to get "here" as an output, but I am getting "trying to substring here"
回答1:
You may actually capture the word you need with str_match
:
str_match(sen, "trying to\\W+\\S+\\W+(\\S+)")[,2]
Or
str_match(sen, "trying to\\s+\\S+\\s+(\\S+)")[,2]
Here, \S+
matches 1 or more chars other than whitespace, and \W+
matches one or more chars other than word chars, and \s+
matches 1+ whitespaces.
Note that in case your "words" are separated with more than whitespace (punctuation, for example) use \W+
. Else, if there is just whitespace, use \s+
.
The [,2]
will access the first captured value (the part of text matched with the part of the pattern inside the first unescaped pair of parentheses).
回答2:
Since you also tagged stringr
, I will post the word
solution,
library(stringr)
word(sub('.*trying to ', '', sen), 2)
#[1] "here"
回答3:
We can use sub
sub("^.*\\btrying to\\s+\\w+\\s+(\\w+).*", "\\1", sen)
#[1] "here"
回答4:
You could use strsplit
. First separate sen
into two parts at "trying to "
and then extract second word of the second part.
sapply(strsplit(sen, "trying to "), function(x) unlist(strsplit(x[2], " "))[2])
#[1] "here"
回答5:
str_split
is sometimes a popular choice. Call the nth word using [1,2], which returns the second word, [1,3] for the third, and so forth.
library(stringr)
#Data
sen= "I am trying to substring here something, but I am not able to"
#Code
str_split(sen, boundary("word"), simplify = T)[1,2]
#> [1] "am"
Created on 2018-08-16 by the reprex package (v0.2.0).
来源:https://stackoverflow.com/questions/45463861/str-extract-extracting-exactly-nth-word-from-a-string