string split on last comma in R

喜你入骨 提交于 2019-12-28 17:44:14

问题


I'm not new to R but I am relative new to regular expression.

A similar question can be found in here.

An example is if I use

> strsplit("UK, USA, Germany", ", ")
[[1]]
[1] "UK"      "USA"     "Germany"

but I want to get

[[1]]
[1] "UK, USA"     "Germany"

Another example is

> strsplit("London, Washington, D.C., Berlin", ", ")
[[1]]
[1] "London"     "Washington" "D.C."       "Berlin"  

and I want to get

[[1]]
[1] "London, Washington, D.C."       "Berlin"  

Definitely Washington, D.C. should not be not divided into two parts, and split only by the last comma, not every comma.

One viable way I think is to replace the last comma by something else such as

$, #, *, ...

then use

strsplit() 

to split the string by the one you replaced (Make sure it is unique!), but I'm more happy if you can deal with the problem using some built in function directly.

So how can I do that? many thanks


回答1:


Here's one approach:

strsplit("UK, USA, Germany", ",(?=[^,]+$)", perl=TRUE)

## [[1]]
## [1] "UK, USA" " Germany"

You may want:

strsplit("UK, USA, Germany", ",\\s*(?=[^,]+$)", perl=TRUE)

## [[1]]
## [1] "UK, USA" "Germany"

As it will match if there is no space after the comma:

strsplit(c("UK, USA, Germany", "UK, USA,Germany"), ",\\s*(?=[^,]+$)", perl=TRUE)

## [[1]]
## [1] "UK, USA" "Germany"
## 
## [[2]]
## [1] "UK, USA" "Germany"



回答2:


You can use stri_split function from stringi package

x <- "USA,UK,Poland"
stri_split_fixed(x,",") # standard split by comma
[[1]]
[1] "USA"    "UK"     "Poland"

stri_split_fixed(x,",",n = 2) # set the max number of elements
[[1]]
[1] "USA"       "UK,Poland"

Unfortunately there is no parameter to change the starting point for splitting (from begin/end) but we can handle this another way - using stri_reverse

stri_split_fixed(stri_reverse(x),",",n = 2) #reverse
[[1]]
[1] "dnaloP" "KU,ASU"

stri_reverse(stri_split_fixed(stri_reverse(x),",",n = 2)[[1]]) #reverse back
[1] "Poland" "USA,UK"
stri_reverse(stri_split_fixed(stri_reverse(x),",",n = 2)[[1]])[2:1] #and again :)
[1] "USA,UK" "Poland"


来源:https://stackoverflow.com/questions/24938616/string-split-on-last-comma-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!