Using R to parse out Surveymonkey csv files

后端未结

关注

 7  1073

说谎 2021-02-05 12:01

I\'m trying to analyse a large survey created with surveymonkey which has hundreds of columns in the CSV file and the output format is difficult to use as the headers run over t

7条回答

臣服心动 (楼主)

2021-02-05 12:54

I have to deal with this pretty frequently, and having the headers on two columns is a bit painful. This function fixes that issue so that you only have a 1 row header to deal with. It also joins the multipunch questions so you have top: bottom style naming.

#' @param x The path to a surveymonkey csv file
fix_names <- function(x) {
  rs <- read.csv(
    x,
    nrows = 2,
    stringsAsFactors = FALSE,
    header = FALSE,
    check.names = FALSE, 
    na.strings = "",
    encoding = "UTF-8"
  )

  rs[rs == ""] <- NA
  rs[rs == "NA"] <- "Not applicable"
  rs[rs == "Response"] <- NA
  rs[rs == "Open-Ended Response"] <- NA

  nms <- c()

  for(i in 1:ncol(rs)) {

    current_top <- rs[1,i]
    current_bottom <- rs[2,i]

    if(i + 1 < ncol(rs)) {
      coming_top <- rs[1, i+1]
      coming_bottom <- rs[2, i+1]
    }

    if(is.na(coming_top) & !is.na(current_top) & (!is.na(current_bottom) | grepl("^Other", coming_bottom)))
      pre <- current_top

    if((is.na(current_top) & !is.na(current_bottom)) | (!is.na(current_top) & !is.na(current_bottom)))
      nms[i] <- paste0(c(pre, current_bottom), collapse = " - ")

    if(!is.na(current_top) & is.na(current_bottom))
      nms[i] <- current_top

  }


  nms
}

If you note, it returns the names only. I typically just read.csv with ...,skip=2, header = FALSE, save to a variable and overwrite the names of the variable. It also helps ALOT to set your na.strings and stringsAsFactor = FALSE.

nms = fix_names("path/to/csv")
d = read.csv("path/to/csv", skip = 2, header = FALSE)
names(d) = nms

0 讨论(0)

查看其它7个回答