Splitting CamelCase in R

前提是你 提交于 2019-11-30 12:26:06
42-
string.to.split = "thisIsSomeCamelCase"
gsub("([A-Z])", " \\1", string.to.split)
# [1] "this Is Some Camel Case"

strsplit(gsub("([A-Z])", " \\1", string.to.split), " ")
# [[1]]
# [1] "this"  "Is"    "Some"  "Camel" "Case" 

Looking at Ramnath's and mine I can say that my initial impression that this was an underspecified question has been supported.

And give Tommy and Ramanth upvotes for pointing out [:upper:]

strsplit(gsub("([[:upper:]])", " \\1", string.to.split), " ")
# [[1]]
# [1] "this"  "Is"    "Some"  "Camel" "Case" 
Ramnath

Here is one way to do it

split_camelcase <- function(...){
  strings <- unlist(list(...))
  strings <- gsub("^[^[:alnum:]]+|[^[:alnum:]]+$", "", strings)
  strings <- gsub("(?!^)(?=[[:upper:]])", " ", strings, perl = TRUE)
  return(strsplit(tolower(strings), " ")[[1]])
}

split_camelcase("thisIsSomeGood")
# [1] "this" "is"   "some" "good"

Here's an approach using a single regex (a Lookahead and Lookbehind):

strsplit(string.to.split, "(?<=[a-z])(?=[A-Z])", perl = TRUE)

## [[1]]
## [1] "this"  "Is"    "Some"  "Camel" "Case" 

Here is a one-liner using the gsubfn package's strapply. The regular expression matches the beginning of the string (^) followed by one or more lower case letters ([[:lower:]]+) or (|) an upper case letter ([[:upper:]]) followed by zero or more lower case letters ([[:lower:]]*) and processes the matched strings with c (which concatenates the individual matches into a vector). As with strsplit it returns a list so we take the first component ([[1]]) :

library(gsubfn)
strapply(string.to.split, "^[[:lower:]]+|[[:upper:]][[:lower:]]*", c)[[1]]
## [1] "this"  "Is"    "Camel" "Case" 

The beginnings of an answer is to split all the characters:

sp.x <- strsplit(string.to.split, "")

Then find which string positions are upper case:

ind.x <- lapply(sp.x, function(x) which(!tolower(x) == x))

Then use that to split out each run of characters . . .

I think my other answer is better than the follwing, but if only a oneliner to split is needed...here we go:

library(snakecase)
unlist(strsplit(to_parsed_case(string.to.split), "_"))
#> [1] "this"  "Is"    "Some"  "Camel" "Case" 

Here an easy solution via snakecase + some tidyverse helpers:

install.packages("snakecase")
library(snakecase)
library(magrittr)
library(stringr)
library(purrr)

string.to.split = "thisIsSomeCamelCase"
to_parsed_case(string.to.split) %>% 
  str_split(pattern = "_") %>% 
  purrr::flatten_chr()
#> [1] "this"  "Is"    "Some"  "Camel" "Case" 

Githublink to snakecase: https://github.com/Tazinho/snakecase

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!