Remove attributes from data read in readr::read_csv

南楼画角 提交于 2021-01-27 15:42:07


readr::read_csv adds attributes that don't get updated when the data is edited. For example,

df <- read_csv("A,B,C\na,1,x\nb,1,y\nc,1,z")

# Remove columns with only one distinct entry
no_info <- df %>% sapply(n_distinct)
no_info <- names(no_info[no_info==1]) 

df2 <- df %>% 

Inspecting the structure, we see that column B is still present in the attributes of df2:

> str(df)
Classes ‘spec_tbl_df’, ‘tbl_df’, ‘tbl’ and 'data.frame':    3 obs. of  3 variables:
 $ A: chr  "a" "b" "c"
 $ B: num  1 1 1
 $ C: chr  "x" "y" "z"
 - attr(*, "spec")=
  .. cols(
  ..   A = col_character(),
  ..   B = col_double(),
  ..   C = col_character()
  .. )
> str(df2)
Classes ‘spec_tbl_df’, ‘tbl_df’, ‘tbl’ and 'data.frame':    3 obs. of  2 variables:
 $ A: chr  "a" "b" "c"
 $ C: chr  "x" "y" "z"
 - attr(*, "spec")=
  .. cols(
  ..   A = col_character(),
  ..   B = col_double(),
  ..   C = col_character()
  .. )
> attributes(df2)
[1] "spec_tbl_df" "tbl_df"      "tbl"         "data.frame" 

[1] 1 2 3

  A = col_character(),
  B = col_double(),
  C = col_character()

[1] "A" "C"


How can I remove columns (or any other updates to the data) and have the changes accurately reflected in the new data structure and attributes?


You can remove column specifiction by setting it to NULL:

> attr(df, 'spec') <- NULL
> str(df)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   3 obs. of  3 variables:
 $ A: chr  "a" "b" "c"
 $ B: int  1 1 1
 $ C: chr  "x" "y" "z"
> df
# A tibble: 3 x 3
  A         B C    
  <chr> <int> <chr>
1 a         1 x    
2 b         1 y    
3 c         1 z    

