问题
I have data like this:
Good afternoon. Hello. My bro's name is John... and he said softly 0.8% : "Don't you think I am handsome??" HAHA. jiji. koko.
I would like to take get the sentence before the quotations, and text inside the quotation by using Look Behind regex in R.
First: I want to look for quotation marks in a bunch of text.
Second: Look back and extract 1 sentence before the quotations. If there is no sentence, it's fine. Still extract the text in the quotations.
Below is what I would like to achieve:
My bro's name is John... and he said softly 0.8%: "Don't you think I am handome??"
I tried using this, but I would like to seek help by using Look Behind regex. Thank you.
regmatches(x, gregexpr('[^\\.]+[\\.\\:]"([^"]*)"', x))
dput :
"Good afternoon. Hello. My bro's name is John... and he said softly 0.8% : \"Don't you think I am handsome?? \" HAHA. jiji. koko."
回答1:
We can also use gsub
. We match one or more characters that is not a .
followed by a .
and one or more space (\\s+
) or one or more space followed by one or more characters that are not space till the end of the string ($
) and replace with ''
.
gsub('[^.]+\\.\\s+|\\s+[^ ]+$', '', str1)
#[1] "My bro's name is John... and he said softly 0.8% : \"Don't you think I am handsome?? \""
Or we match one or more characters that are not a .
followed by a .
followed by one or more space (\\s+
), then we capture the rest of the string until the "
followed by one or more characters (.*
) to the end of the string and replace with the capture group (\\1
).
gsub('^[^.]+\\.\\s+(.*(?:"[^"]+")).*$', '\\1', str1, perl=TRUE)
#[1] "My bro's name is John... and he said softly 0.8% : \"Don't you think I am handsome?? \""
来源:https://stackoverflow.com/questions/33930738/lookbehind-to-get-the-text-in-r-regex