tapply | 易学教程

Summarizing Latitude, Longitude, and Counts Data for ggplot Usage

阅读更多关于 Summarizing Latitude, Longitude, and Counts Data for ggplot Usage

I have been provided with some customer data in Latitude, Longitude, and Counts format. All the data I need to create a ggplot heatmap is present, but I do not know how to put it into the format ggplot requires. I am trying to aggregate the data by total counts within 0.01 Lat and 0.01 Lon blocks (typical heatmap), and I instinctively thought "tapply". This creates a nice summary by block size, as desired, but the format is wrong. Furthermore, I would really like to have empty Lat or Lon block values be included as zeroes, even if there is nothing there... otherwise the heatmap ends up looking

multiply multiple column and find sum of each column for multiple values

阅读更多关于 multiply multiple column and find sum of each column for multiple values

问题 I'm trying to multiply column and get its names. I have a data frame: v1 v2 v3 v4 v5 0 1 1 1 1 0 1 1 0 1 1 0 1 1 0 I'm trying to multiplying each column with other, like: v1v2 v1v3 v1v4 v1v5 and v2v3 v2v4 v2v5 etc, and v1v2v3 v1v2v4 v1v2v5 v2v3v4 v2v3v5 4 combination and 5 combination...if there is n column then n combination. I'm try to use following code in while loop, but it is not working: i<-1 while(i<=ncol(data) { results<-data.frame() v<-i results<- t(apply(data,1,function(x) combn(x,v

What is the difference between the functions tapply and ave?

阅读更多关于 What is the difference between the functions tapply and ave?

I can't wrap my mind around the ave function. I read the help and searched the net but I still cannot understand what it does. I understand it applies some function on a subset of observation but not in the same way as for example tapply Could someone please enlighten me perhaps with a small example? Thanks, and excuse me for perhaps an unusual request. tapply returns a single result for each factor level. ave also produces a single result per factor level, but it copies this value to each position in the original data. ave is handy for producing a new column in a data frame with summary data.

Mean of variable by two factors

阅读更多关于 Mean of variable by two factors

I have the following data: a <- c(1,1,1,1,2,2,2,2) b <- c(2,4,6,8,2,3,4,1) c <- factor(c("A","B","A","B","A","B","A","B")) df <- data.frame( sp=a, length=b, method=c) I can use the following to get a count of the number of samples of each species by method: n <- with(df,tapply(sp,method,function(x) count(x))) How do I also get the mean length by method for each species? Personally I would use aggregate : aggregate(length ~ sp, data = df, FUN= "mean" ) # by species only # sp length #1 1 5.0 #2 2 2.5 aggregate(length ~ sp + method, data = df, FUN= "mean" ) # by species and method # sp method

multiply multiple column and find sum of each column for multiple values

阅读更多关于 multiply multiple column and find sum of each column for multiple values

I'm trying to multiply column and get its names. I have a data frame: v1 v2 v3 v4 v5 0 1 1 1 1 0 1 1 0 1 1 0 1 1 0 I'm trying to multiplying each column with other, like: v1v2 v1v3 v1v4 v1v5 and v2v3 v2v4 v2v5 etc, and v1v2v3 v1v2v4 v1v2v5 v2v3v4 v2v3v5 4 combination and 5 combination...if there is n column then n combination. I'm try to use following code in while loop, but it is not working: i<-1 while(i<=ncol(data) { results<-data.frame() v<-i results<- t(apply(data,1,function(x) combn(x,v,prod))) comb <- combn(colnames(data),v) colnames(results) <- apply(comb,v,function(x) paste(x[1],x[2]

R function which.max with tapply

阅读更多关于 R function which.max with tapply

问题 I am trying to make a data frame with the maximum over records by a factor. I would like a data frame with 4 rows (one for each G) with the max for X in that group and the corresponding Y value. I know I could write a loop but would rather not. Data<-data.frame(X=rnorm(200), Y=rnorm(200), G=rep(c(1,2,3,4), each=50)) XMax<-tapply(Data$X, Data$G, function(x){max(x, na.rm=T)}) WhichXMax<-tapply(Data$X, Data$G, function(x){which.max(x)}) The which.max function returns the row number after the

R function which.max with tapply

阅读更多关于 R function which.max with tapply

I am trying to make a data frame with the maximum over records by a factor. I would like a data frame with 4 rows (one for each G) with the max for X in that group and the corresponding Y value. I know I could write a loop but would rather not. Data<-data.frame(X=rnorm(200), Y=rnorm(200), G=rep(c(1,2,3,4), each=50)) XMax<-tapply(Data$X, Data$G, function(x){max(x, na.rm=T)}) WhichXMax<-tapply(Data$X, Data$G, function(x){which.max(x)}) The which.max function returns the row number after the data has been subsetted by the tapply factor, where I really want the row number referencing the Data rows

How to assign a counter to a specific subset of a data.frame which is defined by a factor combination?

阅读更多关于 How to assign a counter to a specific subset of a data.frame which is defined by a factor combination?

My question is: I have a data frame with some factor variables. I now want to assign a new vector to this data frame, which creates an index for each subset of those factor variables. data <-data.frame(fac1=factor(rep(1:2,5)), fac2=sample(letters[1:3],10,rep=T)) Gives me something like: fac1 fac2 1 1 a 2 2 c 3 1 b 4 2 a 5 1 c 6 2 b 7 1 a 8 2 a 9 1 b 10 2 c And what I want is a combination counter which counts the occurrence of each factor combination. Like this fac1 fac2 counter 1 1 a 1 2 2 c 1 3 1 b 1 4 2 a 1 5 1 c 1 6 2 b 1 7 1 a 2 8 2 a 2 9 1 b 2 10 1 a 3 So far I thought about using tapply

Computing pairwise Hamming distance between all rows of two integer matrices/data frames

阅读更多关于 Computing pairwise Hamming distance between all rows of two integer matrices/data frames

I have two data frames, df1 with reference data and df2 with new data. For each row in df2 , I need to find the best (and the second best) matching row to df1 in terms of hamming distance. I used e1071 package to compute hamming distance. Hamming distance between two vectors x and y can be computed as for example: x <- c(356739, 324074, 904133, 1025460, 433677, 110525, 576942, 526518, 299386, 92497, 977385, 27563, 429551, 307757, 267970, 181157, 3796, 679012, 711274, 24197, 610187, 402471, 157122, 866381, 582868, 878) y <- c(356739, 324042, 904133, 959893, 433677, 110269, 576942, 2230, 267130,

How to pass na.rm as argument to tapply?

阅读更多关于 How to pass na.rm as argument to tapply?

问题 I´d like to calculate mean and sd from a dataframe with one column for the parameter and one column for a group identifier. How can I calculate them when using tapply ? I could use sd(v1, group, na.rm=TRUE) , but can´t fit the na.rm=TRUE into the statement when using tapply . omit.na is no option. I have a whole bunch of parameters and have to go through them step by step without losing half of the dataframe when excluding all lines with one missing value. data("weightgain", package = "HSAUR"