R How to count occurrences of values across multiple columns of a data frame and save the columnwise counts from a particular value as a new row?

问题

I have a large data-frame (approx 1,000 rows and 30,000 columns) that looks like this:

   chr pos  sample1 sample2 sample3 sample 4
    1 5050    1       NA      0       0.5
    1 6300    1       0       0.5     1
    1 7825    1       0       0.5     1
    1 8200    0.5     0.5     0       1

where at a given "chr"&"pos" the value for a given sample can take the form of 0, 0.5, 1, or NA. I have a large number of queries to perform that will require subsetting and ordering the data frame based on summaries of the values for each sample.

I would like to get a count of the number of occurrences of a given value (e.g. 0.5) for each column, and save that as a new row in my data frame. My ultimate goal is to be able to use the values of the new row to subset and/or order the columns of my data frame. I've seen similar questions about counting occurrences, but I can't seem to find/recognize a solution to doing this across all columns simultaneously and saving the column-wise counts for a particular value as a new row.

回答1:

you can apply a function to all the column of you data.frame. Suppose you want to count the number of 'A' in each column of the data.frame d

#a sample data.frame
    L3 <- LETTERS[1:3]
     (d <- data.frame(cbind(x = 1, y = 1:10), fac = sample(L3, 10, replace = TRUE)))



# the function you are looking for
    apply(X=d,2,FUN=function(x) length(which(x=='A')))

回答2:

Very similar to @Jilber. Assumes your data is in a data frame df.

lst      <- colnames(df[,-(1:2)])
count.na <- sapply(lst,FUN=function(x,df){sum(is.na(df[,x]))},df)
count.00 <- sapply(lst,FUN=function(x,df){sum(df[,x]==0,na.rm=T)},df)
count.05 <- sapply(lst,FUN=function(x,df){sum(df[,x]==0.5,na.rm=T)},df)
count.10 <- sapply(lst,FUN=function(x,df){sum(df[,x]==1.0,na.rm=T)},df)

df <- rbind(df, 
            c(NA,NA,count.na), 
            c(NA,NA,count.00), 
            c(NA,NA,count.05), 
            c(NA,NA,count.10))

You would probably want to replace the NA's in the last rbind(...) statement with something that identifies what you are counting.

来源：https://stackoverflow.com/questions/20305851/r-how-to-count-occurrences-of-values-across-multiple-columns-of-a-data-frame-and

标签

count

find-occurrences