Override column types when importing data using readr::read_csv() when there are many columns

后端 未结 2 553
生来不讨喜
生来不讨喜 2021-01-30 11:04

I am trying to read a csv file using readr::read_csv in R. The csv file that I am importing has about 150 columns, I am just including the first few columns for the example. I a

相关标签:
2条回答
  • 2021-01-30 11:13

    Here follows a more generic answer to this question if someone happens to stumble upon this in the future. It is less advisable to use "skip" to jump columns as this will fail to work if the imported data source structure is changed.

    It could be easier in your example to simply set a default column type, and then define any columns that differ from the default.

    E.g., if all columns typically are "d", but the date column should be "D", load the data as follows:

      read_csv(df, col_types = cols(.default = "d", date = "D"))
    

    or if, e.g., column date should be "D" and column "xxx" be "i", do so as follows:

      read_csv(df, col_types = cols(.default = "d", date = "D", xxx = "i"))
    

    The use of "default" above is powerful if you have multiple columns and only specific exceptions (such as "date" and "xxx").

    0 讨论(0)
  • 2021-01-30 11:28

    Yes. For example to force numeric data to be treated as characters:

    examplecsv = "a,b,c\n1,2,a\n3,4,d"
    read_csv(examplecsv)
    # A tibble: 2 x 3
    #      a     b     c
    #  <int> <int> <chr>
    #1     1     2     a
    #2     3     4     d
    read_csv(examplecsv, col_types = cols(b = col_character()))
    # A tibble: 2 x 3
    #      a     b     c
    #  <int> <chr> <chr>
    #1     1     2     a
    #2     3     4     d
    

    Choices are:

    col_character() 
    col_date()
    col_time() 
    col_datetime() 
    col_double() 
    col_factor() # to enforce, will never be guessed
    col_integer() 
    col_logical() 
    col_number() 
    col_skip() # to force skip column
    

    More: http://readr.tidyverse.org/articles/readr.html

    0 讨论(0)
提交回复
热议问题