tapply | 易学教程

Computing pairwise Hamming distance between all rows of two integer matrices/data frames

阅读更多关于 Computing pairwise Hamming distance between all rows of two integer matrices/data frames

问题 I have two data frames, df1 with reference data and df2 with new data. For each row in df2 , I need to find the best (and the second best) matching row to df1 in terms of hamming distance. I used e1071 package to compute hamming distance. Hamming distance between two vectors x and y can be computed as for example: x <- c(356739, 324074, 904133, 1025460, 433677, 110525, 576942, 526518, 299386, 92497, 977385, 27563, 429551, 307757, 267970, 181157, 3796, 679012, 711274, 24197, 610187, 402471,

Remove NA from list of lists

阅读更多关于 Remove NA from list of lists

问题 I have a matrix, data.mat, that looks like: A B C D E 45 43 45 65 23 12 45 56 NA NA 13 4 34 12 NA I am trying to turn this into a list of lists, where each row is one list within a bigger list. I do the following: list <- tapply(data.mat,rep(1:nrow(data.mat),ncol(data.mat)),function(i)i) which gives me a list of lists, with NAs included, such as: $`1` [1] 45 43 45 65 23 $`2` [1] 12 45 56 NA NA $`3` [1] 13 4 34 12 NA But what I want is: $`1` [1] 45 43 45 65 23 $`2` [1] 12 45 56 $`3` [1] 13 4

How do I do a conditional sum which only looks between certain date criteria

阅读更多关于 How do I do a conditional sum which only looks between certain date criteria

问题 Say I have data that looks like date, user, items_bought, event_number 2013-01-01, x, 2, 1 2013-01-02, x, 1, 2 2013-01-03, x, 0, 3 2013-01-04, x, 0, 4 2013-01-04, x, 1, 5 2013-01-04, x, 2, 6 2013-01-05, x, 3, 7 2013-01-06, x, 1, 8 2013-01-01, y, 1, 1 2013-01-02, y, 1, 2 2013-01-03, y, 0, 3 2013-01-04, y, 5, 4 2013-01-05, y, 6, 5 2013-01-06, y, 1, 6 to get the cumulative sum per user per data point I was doing data.frame(cum_items_bought=unlist(tapply(as.numeric(data$items_bought), data$user,

Remove NA from list of lists

阅读更多关于 Remove NA from list of lists

I have a matrix, data.mat, that looks like: A B C D E 45 43 45 65 23 12 45 56 NA NA 13 4 34 12 NA I am trying to turn this into a list of lists, where each row is one list within a bigger list. I do the following: list <- tapply(data.mat,rep(1:nrow(data.mat),ncol(data.mat)),function(i)i) which gives me a list of lists, with NAs included, such as: $`1` [1] 45 43 45 65 23 $`2` [1] 12 45 56 NA NA $`3` [1] 13 4 34 12 NA But what I want is: $`1` [1] 45 43 45 65 23 $`2` [1] 12 45 56 $`3` [1] 13 4 34 12 Is there a good way to remove the NAs either during the tapply call or after the fact? Sure, you

Why does tapply take the subset as NA and not exclude them totally

阅读更多关于 Why does tapply take the subset as NA and not exclude them totally

I have a question. I want to make a barplot with the mean and errorbars, where it is grouped for two factors. To get the mean and the standard errors I used the function tapply. However for one of the factor I want to drop one level. So what I did was did: dataFE <- data[-which(plant=="FS"),] # this works fine, I get exactly the data set I want without the FS level of the factor plant Then to get the mean and standard error I use this: means <- with(dataFE, as.matrix(tapply(leaves, list(plant, Orchestia), mean), nrow=2) e <- with(dataFE, as.matrix(tapply (leaves, list(plant, Orchestia),

How to add tapply results to an existing data frame [duplicate]

阅读更多关于 How to add tapply results to an existing data frame [duplicate]

This question already has an answer here: Calculating statistics on subsets of data [duplicate] 3 answers I would like to add tapply results to the original data frame as a new column. Here is my data frame: dat <- read.table(text = " category birds wolfs snakes yes 3 9 7 no 3 8 4 no 1 2 8 yes 1 2 3 yes 1 8 3 no 6 1 2 yes 6 7 1 no 6 1 5 yes 5 9 7 no 3 8 7 no 4 2 7 notsure 1 2 3 notsure 7 6 3 no 6 1 1 notsure 6 3 9 no 6 1 1 ",header = TRUE) I would like to to add the mean of each category to the data frame as a column. I used: tapply(dat$birds, dat$category, mean) to get the mean per category

How do I do a conditional sum which only looks between certain date criteria

阅读更多关于 How do I do a conditional sum which only looks between certain date criteria

Say I have data that looks like date, user, items_bought, event_number 2013-01-01, x, 2, 1 2013-01-02, x, 1, 2 2013-01-03, x, 0, 3 2013-01-04, x, 0, 4 2013-01-04, x, 1, 5 2013-01-04, x, 2, 6 2013-01-05, x, 3, 7 2013-01-06, x, 1, 8 2013-01-01, y, 1, 1 2013-01-02, y, 1, 2 2013-01-03, y, 0, 3 2013-01-04, y, 5, 4 2013-01-05, y, 6, 5 2013-01-06, y, 1, 6 to get the cumulative sum per user per data point I was doing data.frame(cum_items_bought=unlist(tapply(as.numeric(data$items_bought), data$user, FUN = cumsum))) output from this looks like date, user, items_bought 2013-01-01, x, 2 2013-01-02, x, 3

sum multiple columns by group with tapply

阅读更多关于 sum multiple columns by group with tapply

I wanted to sum individual columns by group and my first thought was to use tapply . However, I cannot get tapply to work. Can tapply be used to sum multiple columns? If not, why not? I have searched the internet extensively and found numerous similar questions posted as far back as 2008. However, none of those questions have been answered directly. Instead, the responses invariably suggest using a different function. Below is an example data set for which I wish to sum apples by state, cherries by state and plums by state. Below that I have compiled numerous alternatives to tapply that do

Why does tapply take the subset as NA and not exclude them totally

阅读更多关于 Why does tapply take the subset as NA and not exclude them totally

问题 I have a question. I want to make a barplot with the mean and errorbars, where it is grouped for two factors. To get the mean and the standard errors I used the function tapply. However for one of the factor I want to drop one level. So what I did was did: dataFE <- data[-which(plant=="FS"),] # this works fine, I get exactly the data set I want without the FS level of the factor plant Then to get the mean and standard error I use this: means <- with(dataFE, as.matrix(tapply(leaves, list(plant

Multiple functions in a single tapply or aggregate statement

阅读更多关于 Multiple functions in a single tapply or aggregate statement

Is it possible to include two functions within a single tapply or aggregate statement? Below I use two tapply statements and two aggregate statements: one for mean and one for SD. I would prefer to combine the statements. my.Data = read.table(text = " animal age sex weight 1 adult female 100 2 young male 75 3 adult male 90 4 adult female 95 5 young female 80 ", sep = "", header = TRUE) with(my.Data, tapply(weight, list(age, sex), function(x) {mean(x)})) with(my.Data, tapply(weight, list(age, sex), function(x) {sd(x) })) with(my.Data, aggregate(weight ~ age + sex, FUN = mean) with(my.Data,