问题
I'm trying to do a regex in stringr for a negative lookbehind in R.
So basically, I have a text data that looks something like this :
See item 7 Management's Discussion and Analysis. BlahBlahBlah. Item 7 Management's Discussion and Analysis. BlahBlahBlah. Item 8 Financial Statements and Supplementary Data.
I want to select everything from the "Item 7" right after the "blahblahblah." sentence to "Item 8-Financial Statements and Supplementary Data"
So I want
Item 7 Management's Discussion and Analysis. BlahBlahBlah. Item 8 Financial Statements and Supplementary Data.
which is everything except for the sentence that contains "see item 7 Management's Discussion and Analysis"
Right now, I'm working with this code:
(?<!see)Item 7(.*?)Item 8
But it's not returning what i want.
My logic is to not look at sentences that contain the word "see" followed by "item 7 Management's Discussion and Analysis" but it doesn't seem to be working.
https://regex101.com/r/yF7aQ1/3
Is there a way I can implement this negative lookbehind?
回答1:
Not sure how you are implementing it in R, .*(?<!See) (item 7 .*)
works with sub
, just be careful with the space after the see and the letter case which you can ignore with ignore.case
parameter.
sub(".*(?<!See) (item 7 .*)", "\\1", s, ignore.case = T, perl = T)
# [1] "Item 7 Management's Discussion and Analysis. BlahBlahBlah. Item 8 Financial Statements and Supplementary Data."
Another alternative:
sub(".*(?=(?<!See) ?item 7)", "", s, ignore.case = T, perl = T)
# [1] "Item 7 Management's Discussion and Analysis. BlahBlahBlah. Item 8 Financial Statements and Supplementary Data."
With str_extract_all()
from stringr
package, which doesn't seem to provide an ignore.case
option, you can use [Ii]
to ignore the case:
library(stringr)
str_extract_all(s, "(?<!See )[Ii]tem 7(.*)")
# [1] "Item 7 Management's Discussion and Analysis. BlahBlahBlah. Item 8 Financial Statements and Supplementary Data."
来源:https://stackoverflow.com/questions/40251836/regex-negative-lookbehind-in-r