R Frequency Table of Likert Data

问题

I have what I thought was a basic task, but has proven otherwise. I have a series of surveys that I need to convert into frequency tables for each survey. For instance, Survey 1 consists of 6 questions in which participants had 5 response options. For each survey, I need to produce a table that has each question (in this example there are 6), along with the percentage of participants who responded with each response option per question.

I have been using prop.table but have only been able to do that for a single question at a time and I haven't figured out how to add a percentage sign, and I lose the question variable title in the row name.

Overall, I would like to print these tables right into a word document. That part I think I have figured out, but now I need to figure the tables out.

I welcome any suggestions. Thanks!

EDIT

Here is what I have so far using some sample Likert data:

q1<-c(2,2,3,3,3,4,4,4,5,5)
q2<-c(2,3,3,4,4,4,4,5,5,5)
q3<-c(2,2,2,3,4,4,4,5,5,5)
df<-data.frame(q1,q2,q3)
x<-prop.table(table(factor(df$q1,levels=1:5)))*100
y<-round(x,digits=1)`

That yields something similar to what I need. However, I would like "q1" to be in the resulting table as a row name, I would like the percentages to have a % sign, and I need a way to incorporate the two additional "q2" "q3" rows into that same table.

Hope that helps. Thank you.

回答1:

q1<-c(2,2,3,3,3,4,4,4,5,5)
q2<-c(2,3,3,4,4,4,4,5,5,5)
q3<-c(2,2,2,3,4,4,4,5,5,5)
df<-data.frame(q1,q2,q3)

library(expss)
# add value lables for preserving empty categories
val_lab(df) = autonum(1:5)
res = df
for(each in colnames(df)){
    res = res %>% 
        tab_cells(list(each)) %>% 
        tab_cols(vars(each)) %>% 
        tab_stat_rpct(total_row_position = "none")
}


res = res %>% tab_pivot() 
# add percentage sign
recode(res[,-1]) = other ~ function(x) ifelse(is.na(x), NA, paste0(round(x, 0), "%"))
res

# |    |  1 |   2 |   3 |   4 |   5 |
# | -- | -- | --- | --- | --- | --- |
# | q1 |    | 20% | 30% | 30% | 20% |
# | q2 |    | 10% | 20% | 40% | 30% |
# | q3 |    | 30% | 10% | 30% | 30% |

If you use knitr then the following code will be helpful:

library(knitr)
res %>% kable

回答2:

I wouldn't advise you doing this because it is not useful for later wrangling, but in order to have it exactly as asked...

for (i in seq_along(names(df))) {
 assign(paste0("x",i), prop.table(table(factor(df[[i]], levels = 1:5))))
}

result <- rbind(x1, x2, x3)
rownames(result) <- names(df)

as.data.frame(matrix(
sprintf("%.0f%%", result*100), 
nrow(result), 
dimnames = dimnames(result)
))

   1   2   3   4   5
q1 0% 20% 30% 30% 20%
q2 0% 10% 20% 40% 30%
q3 0% 30% 10% 30% 30%

The last bit of code is as suggested here.

回答3:

It is hard to give a precise answer without knowing what the data looks like. However assuming I have some sort of data frame already, I would start with creating functions that would systematically transform the data into the plots. I would also use ggplot2 rather than the base R graphics as it would be more flexible.

Suppose you had data frames for each survey. From my experience then you may have rows with one column that indicates a question, and another with the given response to that question.

That is:

survey = data.frame(question = factor(rep(1:6,4)),response = factor(c(1:5,sample(1:5,19, replace = TRUE))))

Then you can create a function that calculates the percent for each response in a question given the data frame above

library(plyr)

# Assumes survey has columns question and response
calculate_percent = function(survey){
  ddply(survey, ~question, function(rows){ 

  total_responses = nrow(rows)

  response_percent =  ddply(rows, ~response, function(rows_response){
    count_response = nrow(rows_response)
    data.frame(response = unique(rows_response$response), percent = (count_response/total_responses)*100)
  })

  data.frame(question = unique(rows$question), response_percent)

  })
}

Then you can create a function that makes a plot given a data frame like the one defined above.

library(ggplot2)
library(scales)

percentage_plot = function(survey){

  calculated_percentages = calculate_percent(survey)

  ggplot(calculated_percentages,aes(x = question, y = percent)) + 
    geom_bar(aes(fill = response),stat = "identity",position = "dodge") +
    scale_y_continuous(labels = percent)
}

Which can finally be used with the call

percentage_plot(survey)

Then since you have multiple surveys you can generalize with additional functions which would systematically process the data in a similar manner as above.

Also you could have done the above plots in facets rather than the grouped box plots here. However since you have more than one survey maybe you want to use facets at that level.

References:

ggplot percentage

ggplot grouped bar plot

Sorry I started writing my example before your edit, hopefully you can still customize to your use case.

Actually it seems that I misunderstood your question and answered a different one.

来源：https://stackoverflow.com/questions/44081159/r-frequency-table-of-likert-data

标签

frequency-analysis