Using stringr to extract one or multiple words from text string in R

假装没事ソ 提交于 2020-01-30 06:34:11

问题


I have the following data frame:

df <- data.frame(city=c("in London", "in Manchester city", "in Sao Paolo"))

I am using str_extract and return the word after 'in' in a separate column.

library(stringr)
str_extract(df$city, '(?<=in\\s)\\w+')

This works fine for me in 95% of cases. However, there are cases like "Sao Paolo" above where my regex would return "Sao" rather than the city name.

Can someone please help me with amending it to capture either:

1) everything to the end of the text string I am extracting from? OR

2) where there is more than one word after 'in', then return that too

Many thanks.


回答1:


To match all the rest of the string after the first in followed with a space, you can use

(?<=in\\s).+

The lookbehind matches the in preposition with a white space after it, but does not return it inside the match since lookbehinds are zero-width assertions.




回答2:


Does this one liner do it for you?

unlist(lapply(strsplit(c("in London", "in Sao Paulo", "in Manchester City"), "in "), function(x) x[2]))
[1] "London"          "Sao Paulo"       "Manchester City"



回答3:


You can try this:

library(stringr)
df$onlyCity <- str_extract(df$city, '[^in ](.)*')
df
                city        onlyCity
1          in London          London
2 in Manchester city Manchester city
3       in Sao Paolo       Sao Paolo



回答4:


gsub("^in[ ]*(.*$)", "\\1", df$city)
[1] "London"          "Manchester city" "Sao Paolo" 

Assumes that your strings start with "in", followed by some number of spaces (won't fail with more than one), followed by the text of interest which is captured from the first non-whitespace character up to the end of the string.



来源:https://stackoverflow.com/questions/34844745/using-stringr-to-extract-one-or-multiple-words-from-text-string-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!