Using R to parse out Surveymonkey csv files

后端未结

关注

 7  1071

I\'m trying to analyse a large survey created with surveymonkey which has hundreds of columns in the CSV file and the output format is difficult to use as the headers run over t

相关标签:

7条回答

太阳男子

2021-02-05 12:36
The issue with the headers is that columns with "select all that apply" will have a blank top row, and the column heading will be the row below. This is only an issue for those types of questions.

With this in mind, I wrote a loop to go through all columns and replace the column names with the value from the second row if the column name was blank- which has a character length of 1.

Then, you can kill the second row of the data and have a tidy data frame.
```
for(i in 1:ncol(df)){
newname <- colnames(df)[i]
if(nchar(newname) < 2){
colnames(df)[i] <- df[1,i]
} 

df <- df[-1,]
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
南旧

2021-02-05 12:39

As of November 2013, the webpage layout seems to have changed. Choose Analyze results > Export All > All Responses Data > Original View > XLS+ (Open in advanced statistical and analytical software). Then go to Exports and download the file. You'll get raw data as first row = question headers / each following row = 1 response, possibly split between multiple files if you have many responses / questions.

0 讨论(0)
发布评论:

提交评论
- 加载中...

星月不相逢

2021-02-05 12:44

What I did in the end was print out the headers using libreoffice labeled as V1,V2, etc. then I just read in the file as

 m1 <- read.csv('Sheet1.csv', header=FALSE, skip=1)

and then just did the analysis against m1$V10, m1$V23 etc...

To get around the mess of multiple columns I used the following little function

# function to merge columns into one with a space separator and then
# remove multiple spaces
mcols <- function(df, cols) {
    # e.g. mcols(df, c(14:18))
        exp <- paste('df[,', cols, ']', sep='', collapse=',' )
        # this creates something like...
        # "df[,14],df[,15],df[,16],df[,17],df[,18]"
        # now we just want to do a paste of this expression...
        nexp <- paste(" paste(", exp, ", sep=' ')")
        # so now nexp looks something like...
        # " paste( df[,14],df[,15],df[,16],df[,17],df[,18] , sep='')"
        # now we just need to parse this text... and eval() it...
        newcol <- eval(parse(text=nexp))
        newcol <- gsub('  *', ' ', newcol) # replace duplicate spaces by a single one
        newcol <- gsub('^ *', '', newcol) # remove leading spaces
        gsub(' *$', '', newcol) # remove trailing spaces
}
# mcols(df, c(14:18))

No doubt somebody will be able to clean this up!

To tidy up Likert-like scales I used:

# function to tidy c('Strongly Agree', 'Agree', 'Disagree', 'Strongly Disagree')
tidylik4 <- function(x) {
  xlevels <- c('Strongly Disagree', 'Disagree', 'Agree', 'Strongly Agree')
  y <- ifelse(x == '', NA, x)
  ordered(y, levels=xlevels)
}

for (i in 44:52) {
  m2[,i] <- tidylik4(m2[,i])
}

Feel free to comment as no doubt this will come up again!

0 讨论(0)

借酒劲吻你

2021-02-05 12:49

You can export it in a convenient form that fits R from Surveymonkey, see download responses in 'Advanced Spreadsheet Format'

0 讨论(0)
发布评论:

提交评论
- 加载中...
死守一世寂寞

2021-02-05 12:51

How about the following: use read.csv() with header=FALSE. Make two arrays, one with the two lines of headings and one with the answers to the survey. Then paste() the two rows/sentences of together. Finally, use colnames().

0 讨论(0)
发布评论:

提交评论
- 加载中...

臣服心动

2021-02-05 12:54

I have to deal with this pretty frequently, and having the headers on two columns is a bit painful. This function fixes that issue so that you only have a 1 row header to deal with. It also joins the multipunch questions so you have top: bottom style naming.

#' @param x The path to a surveymonkey csv file
fix_names <- function(x) {
  rs <- read.csv(
    x,
    nrows = 2,
    stringsAsFactors = FALSE,
    header = FALSE,
    check.names = FALSE, 
    na.strings = "",
    encoding = "UTF-8"
  )

  rs[rs == ""] <- NA
  rs[rs == "NA"] <- "Not applicable"
  rs[rs == "Response"] <- NA
  rs[rs == "Open-Ended Response"] <- NA

  nms <- c()

  for(i in 1:ncol(rs)) {

    current_top <- rs[1,i]
    current_bottom <- rs[2,i]

    if(i + 1 < ncol(rs)) {
      coming_top <- rs[1, i+1]
      coming_bottom <- rs[2, i+1]
    }

    if(is.na(coming_top) & !is.na(current_top) & (!is.na(current_bottom) | grepl("^Other", coming_bottom)))
      pre <- current_top

    if((is.na(current_top) & !is.na(current_bottom)) | (!is.na(current_top) & !is.na(current_bottom)))
      nms[i] <- paste0(c(pre, current_bottom), collapse = " - ")

    if(!is.na(current_top) & is.na(current_bottom))
      nms[i] <- current_top

  }


  nms
}

If you note, it returns the names only. I typically just read.csv with ...,skip=2, header = FALSE, save to a variable and overwrite the names of the variable. It also helps ALOT to set your na.strings and stringsAsFactor = FALSE.

nms = fix_names("path/to/csv")
d = read.csv("path/to/csv", skip = 2, header = FALSE)
names(d) = nms

0 讨论(0)

1 2 下一页