sparklyr change all column names spark dataframe

后端 未结 2 958
半阙折子戏
半阙折子戏 2021-01-15 01:10

I intended to change all column names. The current rename or select operation is too labouring. I dont know if anybody has a better solution. Examples as belwo:



        
相关标签:
2条回答
  • 2021-01-15 01:45

    You can use select_ with .dots:

    df <- copy_to(sc, iris)
    
    newnames <- paste("Name", 1:5, sep="_")
    
    df %>% select_(.dots=setNames(colnames(df), newnames))
    
    # Source:   lazy query [?? x 5]
    # Database: spark_connection
       Name_1 Name_2 Name_3 Name_4 Name_5
        <dbl>  <dbl>  <dbl>  <dbl>  <chr>
     1    5.1    3.5    1.4    0.2 setosa
     2    4.9    3.0    1.4    0.2 setosa
     3    4.7    3.2    1.3    0.2 setosa
     4    4.6    3.1    1.5    0.2 setosa
     5    5.0    3.6    1.4    0.2 setosa
     6    5.4    3.9    1.7    0.4 setosa
     7    4.6    3.4    1.4    0.3 setosa
     8    5.0    3.4    1.5    0.2 setosa
     9    4.4    2.9    1.4    0.2 setosa
    10    4.9    3.1    1.5    0.1 setosa
    

    You can also select with !!!:

    library(rlang)
    library(purrr)
    
    df %>% select(!!! setNames(map(colnames(df), parse_quosure), newnames))
    
    # Source:   lazy query [?? x 5]
    # Database: spark_connection
       Name_1 Name_2 Name_3 Name_4 Name_5
        <dbl>  <dbl>  <dbl>  <dbl>  <chr>
     1    5.1    3.5    1.4    0.2 setosa
     2    4.9    3.0    1.4    0.2 setosa
     3    4.7    3.2    1.3    0.2 setosa
     4    4.6    3.1    1.5    0.2 setosa
     5    5.0    3.6    1.4    0.2 setosa
     6    5.4    3.9    1.7    0.4 setosa
     7    4.6    3.4    1.4    0.3 setosa
     8    5.0    3.4    1.5    0.2 setosa
     9    4.4    2.9    1.4    0.2 setosa
    10    4.9    3.1    1.5    0.1 setosa
    # ... with more rows
    
    0 讨论(0)
  • 2021-01-15 01:45

    The solutions listed above did not work for me. I did find a straight forward solution documented in github which works with sparklyr.

    rename() doesn't support unquoting of character vectors #3030

    Below is an excerpt of my script expanding on the method described in the link above.

    library(dplyr)
    library(stringr)
    
    # Generate list of column names without special characters (replace spaces and dashes with underscores)
    list_new_names = colnames(spark_df) %>% str_remove_all('LAST ') %>% str_replace_all(' - ', '_') %>% str_replace_all(' ', '_')
    # Generate list used to rename columns
    list_new_names = colnames(spark_df) %>% setNames(list_new_names)
    # Rename columns
    spark_df = spark_df %>% rename(!!! list_new_names)
    
    0 讨论(0)
提交回复
热议问题