Extracting a string of words from a string vector data

无人久伴 提交于 2020-05-29 09:50:06

问题


I have a string vector data as shown below

Data
Posted by Mohit Garg on May 7, 2016
Posted by Dr. Lokesh Garg on April 8, 2018
Posted by Lokesh.G.S  on June 11, 2001
Posted by Mohit.G.S. on July 23, 2005
Posted by Dr.Mohit G Kumar Saha on August 2, 2019

I have used str_extract() function as

str_extract(Data, "Posted by \\w+. \\w+ \\w+")

It generated the output as

[1] "Posted by Mohit Garg on"   "Posted by Dr. Lokesh Garg" NA                         
[4] NA                          NA  

I want the output should like

[1] "Posted by Mohit Garg on"   "Posted by Dr. Lokesh Garg"  "Posted by Lokesh.G.S"                       
[4] "Posted by Mohit.G.S."                     "Posted by Dr.Mohit G Kumar Saha"

回答1:


Probably you can try :

stringr::str_extract(df$Data, "Posted by .+?(?=\\s+on)")

#[1] "Posted by Mohit Garg" "Posted by Dr. Lokesh Garg"  "Posted by Lokesh.G.S"
#[4] "Posted by Mohit.G.S." "Posted by Dr.Mohit G Kumar Saha"

This extracts everything from "Posted by" till "on" excluding "on".


Same in base R :

sub(".*(Posted by .+?)(?=\\s+on).*", '\\1', df$Data, perl = TRUE) 

data

df <- structure(list(Data = c("Posted by Mohit Garg on May 7, 2016", 
"Posted by Dr. Lokesh Garg on April 8, 2018", "Posted by Lokesh.G.S  on June 11, 2001", 
"Posted by Mohit.G.S. on July 23, 2005", "Posted by Dr.Mohit G Kumar Saha on August 2, 2019"
)), class = "data.frame", row.names = c(NA, -5L))



回答2:


You can use sub and remove on and everything after it with *on.*.

sub(" +?on.*$", "", Data)
#[1] "Posted by momon"                 "Posted by on Mohit Garg"        
#[3] "Posted by Dr. Lokesh Garg"       "Posted by Lokesh.G.S"           
#[5] "Posted by Mohit.G.S."            "Posted by Dr.Mohit G Kumar Saha"

Or with perl = TRUE

sub("(.*) +on.*", "\\1", Data, perl = TRUE)

Data:

Data <- c("Posted by momon on Monday 29 Feb 2020"
, "Posted by on Mohit Garg on May 7, 2016"
, "Posted by Dr. Lokesh Garg on April 8, 2018"
, "Posted by Lokesh.G.S  on June 11, 2001"
, "Posted by Mohit.G.S. on July 23, 2005"
, "Posted by Dr.Mohit G Kumar Saha on August 2, 2019")

Have a look at R regex compiler working differently for the given regex.



来源:https://stackoverflow.com/questions/62015531/extracting-a-string-of-words-from-a-string-vector-data

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!