Create summary table of categorical variables of different lengths

后端 未结 6 598
伪装坚强ぢ
伪装坚强ぢ 2021-01-02 11:04

In SPSS it is fairly easy to create a summary table of categorical variables using \"Custom Tables\":

\"This

6条回答
  •  -上瘾入骨i
    2021-01-02 11:35

    Unfortunately there seems to be no R package yet that can generate a nice output like SPSS. Most functions for generating tables seem to define their own special formats what gets you into trouble if you want to export or work on it in another way.
    But I'm sure R is capable of that and so I started writing my own functions. I'm happy to share the result (work in progress-status, but gets the job done) with you:

    The following function returns for all factor variables in a data.frame the frequency or the percentage (calc="perc") for each level of the factor variable "variable".
    The most important thing may be that the output is a simple & user friendly data.frame. So, compared to many other functions, it's no problem to export the results an work with it in any way you want.

    I realize that there is much potential for further improvements, i.e. add a possibility for selecting row vs. column percentage calculation, etc.

    contitable <- function( survey_data, variable, calc="freq" ){    
    
      # Check which variables are not given as factor    
      # and exlude them from the given data.frame    
     survey_data_factor_test <- as.logical( sapply( Survey, FUN=is.factor) )    
      survey_data <- subset( survey_data, select=which( survey_data_factor_test ) )    
    
      # Inform the user about deleted variables    
      # is that proper use of printing to console during a function call??    
      # for now it worksjust fine...    
      flush.console()        
      writeLines( paste( "\n ", sum( !survey_data_factor_test, na.rm=TRUE),
                "non-factor variable(s) were excluded\n" ) )
    
      variable_levels <- levels(survey_data[ , variable ])    
      variable_levels_length <- length( variable_levels )    
    
      # Initializing the data.frame which will gather the results    
      result <- data.frame( "Variable", "Levels", t(rep( 1, each=variable_levels_length ) ) )    
      result_column_names <- paste( variable, variable_levels, sep="." )    
      names(result) <- c("Variable", "Levels", result_column_names )    
    
      for(column in 1:length( names(survey_data) ) ){       
    
          column_levels_length <- length( levels( survey_data[ , column ] ) )
          result_block <- as.data.frame( rep( names(survey_data)[column], each=column_levels_length ) )
          result_block <- cbind( result_block, as.data.frame( levels( survey_data[,column] ) ) )
          names(result_block) <- c( "Variable", "Levels" )
    
          results <- table( survey_data[ , column ], survey_data[ , variable ] )
    
          if( calc=="perc" ){ 
            results <- apply( results, MARGIN=2, FUN=function(x){ x/sum(x) }) 
            results <- round( results*100, 1 )
          }
    
          results <- unclass(results)
          results <- as.data.frame( results )
          names( results ) <- result_column_names
          rownames(results) <- NULL
    
          result_block <- cbind( result_block, results) 
          result <- rbind( result, result_block ) 
    }    
    result <- result[-1,]        
    return( result )    
    }
    

提交回复
热议问题