发表新帖

发表新帖

SparklyR separate one Spark DataFrame column into two columns

前端未结

关注

 2  516

I have a dataframe containing a column named COL which is structured in this way:

VALUE1###VALUE2

The followin

相关标签:

2条回答

别那么骄傲

2020-12-21 11:50
You can use ft_regex_tokenizer followed by sdf_separate_column.

ft_regex_tokenizer will split a column into a vector type, based on a regex. sdf_separate_column will split this into multiple columns.
```
mydf %>% 
    ft_regex_tokenizer(input_col="mycolumn", output_col="mycolumnSplit", pattern=";") %>% 
    sdf_separate_column("mycolumnSplit", into=c("column1", "column2")
```
UPDATE: in recent versions of sparklyr, the parameters input.col and output.col have been renamed to input_col and output_col, respectively.
0 讨论(0)
发布评论:

提交评论
- 加载中...
长发绾君心

2020-12-21 11:57
Sparklyr version 0.5 has just been released, and it contains the ft_regex_tokenizer() function that can do that:

A regex based tokenizer that extracts tokens either by using the provided regex pattern to split the text (default) or repeatedly matching the regex (if gaps is false).
```
library(dplyr)
library(sparklyr)
ft_regex_tokenizer(input_DF, input.col = "COL", output.col = "ResultCols", pattern = '\\###')
```
The splitted column "ResultCols" will be a list.
0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题