问题
I have a dataframe in R with one column (called 'city') containing a text string. My goal is to extract only one word ie the city text from the text string. The city text always follows the word 'in', eg the text might be:
'in London'
'in Manchester'
I tried to create a new column ('municipality'):
df$municipality <- gsub(".*in ?([A-Z+).*$","\\1",df$city)
This gives me the first letter following 'in', but I need the next word (ONLY the next word)
I then tried:
gsub(".*in ?([A-Z]\w+))")
which worked on a regex checker, but not in R. Can someone please help me. I know this is probably very simple but I can't crack it. Thanks in advance.
回答1:
We can use str_extract
library(stringr)
str_extract(df$city, '(?<=in\\s)\\w+')
#[1] "London" "Manchester"
回答2:
The following regular expression will match the second word from your city
column:
^in\\s([^ ]*).*$
This matches the word in
followed a single space, followed by a capture group of any non space characters, which comprises the city name.
Example:
df <- data.frame(city=c("in London town", "in Manchester city"))
df$municipality <- gsub("^in\\s([^ ]*).*$", "\\1", df$city)
> df$municipality
[1] "London" "Manchester"
来源:https://stackoverflow.com/questions/34804708/matching-a-word-after-another-word-in-r-regex