问题
I have a string vector data as shown below
Data
Posted by Mohit Garg on May 7, 2016
Posted by Dr. Lokesh Garg on April 8, 2018
Posted by Lokesh.G.S on June 11, 2001
Posted by Mohit.G.S. on July 23, 2005
Posted by Dr.Mohit G Kumar Saha on August 2, 2019
I have used str_extract() function as
str_extract(Data, "Posted by \\w+. \\w+ \\w+")
It generated the output as
[1] "Posted by Mohit Garg on" "Posted by Dr. Lokesh Garg" NA
[4] NA NA
I want the output should like
[1] "Posted by Mohit Garg on" "Posted by Dr. Lokesh Garg" "Posted by Lokesh.G.S"
[4] "Posted by Mohit.G.S." "Posted by Dr.Mohit G Kumar Saha"
回答1:
Probably you can try :
stringr::str_extract(df$Data, "Posted by .+?(?=\\s+on)")
#[1] "Posted by Mohit Garg" "Posted by Dr. Lokesh Garg" "Posted by Lokesh.G.S"
#[4] "Posted by Mohit.G.S." "Posted by Dr.Mohit G Kumar Saha"
This extracts everything from "Posted by"
till "on"
excluding "on"
.
Same in base R :
sub(".*(Posted by .+?)(?=\\s+on).*", '\\1', df$Data, perl = TRUE)
data
df <- structure(list(Data = c("Posted by Mohit Garg on May 7, 2016",
"Posted by Dr. Lokesh Garg on April 8, 2018", "Posted by Lokesh.G.S on June 11, 2001",
"Posted by Mohit.G.S. on July 23, 2005", "Posted by Dr.Mohit G Kumar Saha on August 2, 2019"
)), class = "data.frame", row.names = c(NA, -5L))
回答2:
You can use sub
and remove on
and everything after it with *on.*
.
sub(" +?on.*$", "", Data)
#[1] "Posted by momon" "Posted by on Mohit Garg"
#[3] "Posted by Dr. Lokesh Garg" "Posted by Lokesh.G.S"
#[5] "Posted by Mohit.G.S." "Posted by Dr.Mohit G Kumar Saha"
Or with perl = TRUE
sub("(.*) +on.*", "\\1", Data, perl = TRUE)
Data:
Data <- c("Posted by momon on Monday 29 Feb 2020"
, "Posted by on Mohit Garg on May 7, 2016"
, "Posted by Dr. Lokesh Garg on April 8, 2018"
, "Posted by Lokesh.G.S on June 11, 2001"
, "Posted by Mohit.G.S. on July 23, 2005"
, "Posted by Dr.Mohit G Kumar Saha on August 2, 2019")
Have a look at R regex compiler working differently for the given regex.
来源:https://stackoverflow.com/questions/62015531/extracting-a-string-of-words-from-a-string-vector-data