Row and column sums in R

元气小坏坏 提交于 2019-12-23 20:30:06

问题


This is an example of how my data set (MergedData) looks like in R, where each of my participants (5 rows) obtained a score number in every test (7 columns). I would like to know the total score of all tests combined (all columns) but for each participant (row).

Also, my complete data set has more than just these few variables, so if possible, I would like do it using a formula & loop and not having to type row by row/column by column.

Participant TestScores     
ParticipantA    2   4   2   3   2   3   4
ParticipantB    1   3   2   2   3   3   3
ParticipantC    1   4   4   2   3   4   2
ParticipantD    2   4   2   3   2   4   4
ParticipantE    1   3   2   2   2   2   2

I have tried this but it doesn't work:

Test_Scores <- rowSums(MergedData[Test1, Test2, Test3], na.rm=TRUE)

I get the following error-message:

Error in `[.data.frame`(MergedData, Test1, Test2, Test3,  : 
  unused arguments

How do I solve this? Thank you!!


回答1:


I think you want this:

rowSums(MergedData[,c('Test1', 'Test2', 'Test3')], na.rm=TRUE)



回答2:


You could use:

MergedData$Test_Scores_Sum <- rowSums(MergedData[,2:8], na.rm=TRUE)

Where 2:8 are all the columns (tests) you want to sum up. This way it will create another column in your data.

This way you dont have to type each column name and you can still have other columns in you data frame which will not be summed up. Note however, that all columns of tests you want to sum up should be beside each other (as in your example data).




回答3:


Please consult the documentation for ?rowSumsand ?colSums.

It's not clear from your post exactly what MergedData is. Assuming it's a data.frame, the problem is your indexing MergedData[Test1, Test2, Test3]. If it is a data.frame, you'd like to run something like:

Test_Scores <- rowSums(MergedData, na.rm = TRUE)

or

Test_Scores <- rowSums(MergedData[, c("Test1", "Test2", "Test3")], na.rm = TRUE)

if you only want to use the columns named "Test1", "Test2", and "Test3" (if they indeed are named so).

If this doesn't work. Please show us the output of str(MergedData).

You need to provide a minimal reproducible example of the error to get any really helpful answers.




回答4:


For small data, it might be interesting to convert the data.frame to a table then use addmargins().

With this sample data

MergedData<-data.frame(Participant=letters[1:5],
    Test1 = c(2,1,1,2,1),
    Test2 = c(4,3,4,4,3),
    Test3 = c(2,2,4,2,2),
    Test4 = c(3,2,2,3,2),
    Test5 = c(2,3,3,2,2)
)

and this helper function

as.table.data.frame<-function(x, rownames=0) {
    numerics <- sapply(x,is.numeric)
    chars <- which(sapply(x,function(x) is.character(x) || is.factor(x)))
    names <- if(!is.null(rownames)) {
        if (length(rownames)==1) {
            if (rownames ==0) {
                 rownames(x)
            } else {
                as.character(x[,rownames])
            }
        } else {
            rownames
        }
    } else {
          if(length(chars)==1) {
            as.character(x[,chars])
        } else {
            rownames(x)
        }
    }
    x<-as.matrix(x[,numerics])
    rownames(x)<-names
    structure(x, class="table")
}

you could do

addmargins(as.table(MergedData))

to get

    Test1 Test2 Test3 Test4 Test5 Sum
a       2     4     2     3     2  13
b       1     3     2     2     3  11
c       1     4     4     2     3  14
d       2     4     2     3     2  13
e       1     3     2     2     2  10
Sum     7    18    12    12    12  61

Probably not super useful in this case, but a fun use of addmargins nonetheless.




回答5:


Four previous answers and only one showing a result? What's up with that? Here's one

> dat <- read.table(header=T, text = 
  'Participant Test1 Test2 Test3 Test4 Test5 Test6 Test7     
  ParticipantA    2   4   2   3   2   3   4
  ParticipantB    1   3   2   2   3   3   3
  ParticipantC    1   4   4   2   3   4   2
  ParticipantD    2   4   2   3   2   4   4
  ParticipantE    1   3   2   2   2   2   2')

You wrote that

"...if possible, I would like do it using a formula & loop and not having to type row by > row/column by column"

You won't have to write any loops at all. The row and column functions operate on all the row and all the columns, with no looping.

> rowSums(dat[-1], na.rm = TRUE)
## [1] 20 17 20 21 14
> colSums(dat[-1], na.rm = TRUE)
##  Test1  Test2  Test3  Test4  Test5  Test6  Test7 
##      7     18     12     12     12     16     15 



回答6:


Here's a way to do it with dplyr and reshape2:

dat <- read.table(header=T, text = 
                    'Participant Test1 Test2 Test3 Test4 Test5 Test6 Test7     
  ParticipantA    2   4   2   3   2   3   4
  ParticipantB    1   3   2   2   3   3   3
  ParticipantC    1   4   4   2   3   4   2
  ParticipantD    2   4   2   3   2   4   4
  ParticipantE    1   3   2   2   2   2   2')

library(dplyr) 
library(reshape2)    

# Melt data into long format
dat.l = melt(dat, id.var="Participant", variable.name="Test")    
> dat.l
    Participant  Test value
1  ParticipantA Test1     2
2  ParticipantB Test1     1
3  ParticipantC Test1     1
4  ParticipantD Test1     2
...
32 ParticipantB Test7     3
33 ParticipantC Test7     2
34 ParticipantD Test7     4
35 ParticipantE Test7     2

# Sum by Participant
dat.l %.%
  group_by(Participant) %.%
  summarise(Sum=sum(value))

   Participant Sum
1 ParticipantA  20
2 ParticipantB  17
3 ParticipantC  20
4 ParticipantD  21
5 ParticipantE  14

# Sum by Test
dat.l %.%
  group_by(Test) %.%
  summarise(Sum=sum(value))

   Test Sum
1 Test1   7
2 Test2  18
3 Test3  12
4 Test4  12
5 Test5  12
6 Test6  16
7 Test7  15


来源:https://stackoverflow.com/questions/23568142/row-and-column-sums-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!