R read.csv “More columns than column names” error

后端 未结 5 552
半阙折子戏
半阙折子戏 2020-12-29 17:40

I have a problem when importing .csv file into R. With my code:

t <- read.csv(\"C:\\\\N0_07312014.CSV\", na.string=c(\"\",\"null\",\"NaN\",\"         


        
相关标签:
5条回答
  • 2020-12-29 17:51

    If you only need the first 32 columns, and you know how many columns there are, you can set the other columns classes to NULL.

    read.csv("C:\\N0_07312014.CSV", na.string=c("","null","NaN","X"),
          header=T, stringsAsFactors=FALSE,
          colClasses=c(rep("character",32),rep("NULL",10)))
    

    If you do not want to code up each colClass and you like the guesses read.csv then just save that csv and open it again.

    Alternatively, you can skip the header and name the columns yourself and remove the misbehaved rows.

    A<-data.frame(read.csv("N0_07312014.CSV",
                            header=F,stringsAsFactors=FALSE,
                            colClasses=c(rep("character",32),rep("NULL",5)),
                            na.string=c("","null","NaN","X")))
    Yournames<-as.character(A[1,])
    names(A)<-Yournames
    yourdata<-unique(A)[-1,]
    

    The code above assumes you do not want any duplicate rows. You can alternatively remove rows that have the first entry equal to the first column name, but I'll leave that to you.

    0 讨论(0)
  • 2020-12-29 17:55

    Open the .csv as a text file (for example, use TextEdit on a Mac) and check to see if columns are being separated with commas.

    csv is "comma separated vectors". For some reason when Excel saves my csv's it uses semicolons instead.

    When opening your csv use:

    read.csv("file_name.csv",sep=";")
    

    Semi colon is just an example but as someone else previously suggested don't assume that because your csv looks good in Excel that it's so.

    0 讨论(0)
  • 2020-12-29 17:56

    That's one wonky CSV file. Multiple headers tossed about (try pasting it to CSV Fingerprint) to see what I mean.

    Since I don't know the data, it's impossible to be sure the following produces accurate results for you, but it involves using readLines and other R functions to pre-process the text:

    # use readLines to get the data
    dat <- readLines("N0_07312014.CSV")
    
    # i had to do this to fix grep errors
    Sys.setlocale('LC_ALL','C')
    
    # filter out the repeating, and wonky headers
    dat_2 <- grep("Node Name,RTC_date", dat, invert=TRUE, value=TRUE)
    
    # turn that vector into a text connection for read.csv
    dat_3 <- read.csv(textConnection(paste0(dat_2, collapse="\n")),
                      header=FALSE, stringsAsFactors=FALSE)
    
    str(dat_3)
    ## 'data.frame':    308 obs. of  37 variables:
    ##  $ V1 : chr  "Node 0" "Node 0" "Node 0" "Node 0" ...
    ##  $ V2 : chr  "07/31/2014" "07/31/2014" "07/31/2014" "07/31/2014" ...
    ##  $ V3 : chr  "08:58:18" "08:59:22" "08:59:37" "09:00:06" ...
    ##  $ V4 : chr  "" "" "" "" ...
    ## .. more
    ##  $ V36: chr  "" "" "" "" ...
    ##  $ V37: chr  "0" "0" "0" "0" ...
    
    # grab the headers
    headers <- strsplit(dat[1], ",")[[1]]
    
    # how many of them are there?
    length(headers)
    ## [1] 32
    
    # limit it to the 32 columns you want (Which matches)
    dat_4 <- dat_3[,1:32]
    
    # and add the headers
    colnames(dat_4) <- headers
    
    str(dat_4)
    ## 'data.frame':    308 obs. of  32 variables:
    ##  $ Node Name         : chr  "Node 0" "Node 0" "Node 0" "Node 0" ...
    ##  $ RTC_date          : chr  "07/31/2014" "07/31/2014" "07/31/2014" "07/31/2014" ...
    ##  $ RTC_time          : chr  "08:58:18" "08:59:22" "08:59:37" "09:00:06" ...
    ##  $ N1 Bat (VDC)      : chr  "" "" "" "" ...
    ##  $ N1 Shinyei (ug/m3): chr  "" "" "0.23" "null" ...
    ##  $ N1 CC (ppb)       : chr  "" "" "null" "null" ...
    ##  $ N1 Aeroq (ppm)    : chr  "" "" "null" "null" ...
    ## ... continues
    
    0 讨论(0)
  • 2020-12-29 18:02

    try read.table() instead of read.csv()

    0 讨论(0)
  • 2020-12-29 18:02

    I was also facing the same issue. Now solved.

    Just use header = FALSE

    read.csv("data.csv", header = FALSE) -> mydata
    
    0 讨论(0)
提交回复
热议问题