I intended to change all column names. The current rename or select operation is too labouring. I dont know if anybody has a better solution. Examples as belwo:
You can use select_
with .dots
:
df <- copy_to(sc, iris)
newnames <- paste("Name", 1:5, sep="_")
df %>% select_(.dots=setNames(colnames(df), newnames))
# Source: lazy query [?? x 5]
# Database: spark_connection
Name_1 Name_2 Name_3 Name_4 Name_5
<dbl> <dbl> <dbl> <dbl> <chr>
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5.0 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
You can also select
with !!!
:
library(rlang)
library(purrr)
df %>% select(!!! setNames(map(colnames(df), parse_quosure), newnames))
# Source: lazy query [?? x 5]
# Database: spark_connection
Name_1 Name_2 Name_3 Name_4 Name_5
<dbl> <dbl> <dbl> <dbl> <chr>
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5.0 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
# ... with more rows
The solutions listed above did not work for me. I did find a straight forward solution documented in github which works with sparklyr.
rename() doesn't support unquoting of character vectors #3030
Below is an excerpt of my script expanding on the method described in the link above.
library(dplyr)
library(stringr)
# Generate list of column names without special characters (replace spaces and dashes with underscores)
list_new_names = colnames(spark_df) %>% str_remove_all('LAST ') %>% str_replace_all(' - ', '_') %>% str_replace_all(' ', '_')
# Generate list used to rename columns
list_new_names = colnames(spark_df) %>% setNames(list_new_names)
# Rename columns
spark_df = spark_df %>% rename(!!! list_new_names)