tapply

Computing pairwise Hamming distance between all rows of two integer matrices/data frames

拈花ヽ惹草 提交于 2019-11-30 19:31:40
问题 I have two data frames, df1 with reference data and df2 with new data. For each row in df2 , I need to find the best (and the second best) matching row to df1 in terms of hamming distance. I used e1071 package to compute hamming distance. Hamming distance between two vectors x and y can be computed as for example: x <- c(356739, 324074, 904133, 1025460, 433677, 110525, 576942, 526518, 299386, 92497, 977385, 27563, 429551, 307757, 267970, 181157, 3796, 679012, 711274, 24197, 610187, 402471,

Remove NA from list of lists

China☆狼群 提交于 2019-11-30 12:16:30
问题 I have a matrix, data.mat, that looks like: A B C D E 45 43 45 65 23 12 45 56 NA NA 13 4 34 12 NA I am trying to turn this into a list of lists, where each row is one list within a bigger list. I do the following: list <- tapply(data.mat,rep(1:nrow(data.mat),ncol(data.mat)),function(i)i) which gives me a list of lists, with NAs included, such as: $`1` [1] 45 43 45 65 23 $`2` [1] 12 45 56 NA NA $`3` [1] 13 4 34 12 NA But what I want is: $`1` [1] 45 43 45 65 23 $`2` [1] 12 45 56 $`3` [1] 13 4

How do I do a conditional sum which only looks between certain date criteria

白昼怎懂夜的黑 提交于 2019-11-30 08:52:58
问题 Say I have data that looks like date, user, items_bought, event_number 2013-01-01, x, 2, 1 2013-01-02, x, 1, 2 2013-01-03, x, 0, 3 2013-01-04, x, 0, 4 2013-01-04, x, 1, 5 2013-01-04, x, 2, 6 2013-01-05, x, 3, 7 2013-01-06, x, 1, 8 2013-01-01, y, 1, 1 2013-01-02, y, 1, 2 2013-01-03, y, 0, 3 2013-01-04, y, 5, 4 2013-01-05, y, 6, 5 2013-01-06, y, 1, 6 to get the cumulative sum per user per data point I was doing data.frame(cum_items_bought=unlist(tapply(as.numeric(data$items_bought), data$user,

Remove NA from list of lists

旧巷老猫 提交于 2019-11-30 02:45:43
I have a matrix, data.mat, that looks like: A B C D E 45 43 45 65 23 12 45 56 NA NA 13 4 34 12 NA I am trying to turn this into a list of lists, where each row is one list within a bigger list. I do the following: list <- tapply(data.mat,rep(1:nrow(data.mat),ncol(data.mat)),function(i)i) which gives me a list of lists, with NAs included, such as: $`1` [1] 45 43 45 65 23 $`2` [1] 12 45 56 NA NA $`3` [1] 13 4 34 12 NA But what I want is: $`1` [1] 45 43 45 65 23 $`2` [1] 12 45 56 $`3` [1] 13 4 34 12 Is there a good way to remove the NAs either during the tapply call or after the fact? Sure, you

Why does tapply take the subset as NA and not exclude them totally

萝らか妹 提交于 2019-11-29 17:36:54
I have a question. I want to make a barplot with the mean and errorbars, where it is grouped for two factors. To get the mean and the standard errors I used the function tapply. However for one of the factor I want to drop one level. So what I did was did: dataFE <- data[-which(plant=="FS"),] # this works fine, I get exactly the data set I want without the FS level of the factor plant Then to get the mean and standard error I use this: means <- with(dataFE, as.matrix(tapply(leaves, list(plant, Orchestia), mean), nrow=2) e <- with(dataFE, as.matrix(tapply (leaves, list(plant, Orchestia),

How to add tapply results to an existing data frame [duplicate]

回眸只為那壹抹淺笑 提交于 2019-11-29 11:40:33
This question already has an answer here: Calculating statistics on subsets of data [duplicate] 3 answers I would like to add tapply results to the original data frame as a new column. Here is my data frame: dat <- read.table(text = " category birds wolfs snakes yes 3 9 7 no 3 8 4 no 1 2 8 yes 1 2 3 yes 1 8 3 no 6 1 2 yes 6 7 1 no 6 1 5 yes 5 9 7 no 3 8 7 no 4 2 7 notsure 1 2 3 notsure 7 6 3 no 6 1 1 notsure 6 3 9 no 6 1 1 ",header = TRUE) I would like to to add the mean of each category to the data frame as a column. I used: tapply(dat$birds, dat$category, mean) to get the mean per category

How do I do a conditional sum which only looks between certain date criteria

*爱你&永不变心* 提交于 2019-11-29 08:43:39
Say I have data that looks like date, user, items_bought, event_number 2013-01-01, x, 2, 1 2013-01-02, x, 1, 2 2013-01-03, x, 0, 3 2013-01-04, x, 0, 4 2013-01-04, x, 1, 5 2013-01-04, x, 2, 6 2013-01-05, x, 3, 7 2013-01-06, x, 1, 8 2013-01-01, y, 1, 1 2013-01-02, y, 1, 2 2013-01-03, y, 0, 3 2013-01-04, y, 5, 4 2013-01-05, y, 6, 5 2013-01-06, y, 1, 6 to get the cumulative sum per user per data point I was doing data.frame(cum_items_bought=unlist(tapply(as.numeric(data$items_bought), data$user, FUN = cumsum))) output from this looks like date, user, items_bought 2013-01-01, x, 2 2013-01-02, x, 3

sum multiple columns by group with tapply

假如想象 提交于 2019-11-29 01:38:37
I wanted to sum individual columns by group and my first thought was to use tapply . However, I cannot get tapply to work. Can tapply be used to sum multiple columns? If not, why not? I have searched the internet extensively and found numerous similar questions posted as far back as 2008. However, none of those questions have been answered directly. Instead, the responses invariably suggest using a different function. Below is an example data set for which I wish to sum apples by state, cherries by state and plums by state. Below that I have compiled numerous alternatives to tapply that do

Why does tapply take the subset as NA and not exclude them totally

a 夏天 提交于 2019-11-28 12:49:42
问题 I have a question. I want to make a barplot with the mean and errorbars, where it is grouped for two factors. To get the mean and the standard errors I used the function tapply. However for one of the factor I want to drop one level. So what I did was did: dataFE <- data[-which(plant=="FS"),] # this works fine, I get exactly the data set I want without the FS level of the factor plant Then to get the mean and standard error I use this: means <- with(dataFE, as.matrix(tapply(leaves, list(plant

Multiple functions in a single tapply or aggregate statement

♀尐吖头ヾ 提交于 2019-11-28 07:37:55
Is it possible to include two functions within a single tapply or aggregate statement? Below I use two tapply statements and two aggregate statements: one for mean and one for SD. I would prefer to combine the statements. my.Data = read.table(text = " animal age sex weight 1 adult female 100 2 young male 75 3 adult male 90 4 adult female 95 5 young female 80 ", sep = "", header = TRUE) with(my.Data, tapply(weight, list(age, sex), function(x) {mean(x)})) with(my.Data, tapply(weight, list(age, sex), function(x) {sd(x) })) with(my.Data, aggregate(weight ~ age + sex, FUN = mean) with(my.Data,