sapply | 易学教程

generate multinomial random varibles with varying sample size in R

阅读更多关于 generate multinomial random varibles with varying sample size in R

问题 I need to genereate multinomial random variables with varying sample size. Let say i already generated my sample sizes as follows, samplesize =c(50,45,40,48) then i need to generate multinomial random variables based on this varying sample size. I tried this using a for loop and using a apply function(sapply). Using For loop , p1=c(0.4,0.3,0.3) for( i in 1:4) { xx1[i]=rmultinom(4, samplesize[i], p1) } If my code is correct then i should get a matrix that have 4 columns and 3 rows. Where

How to limit the calculation in a sapply function?

阅读更多关于 How to limit the calculation in a sapply function?

问题 Several weeks ago I had a problem with calculating a coefficient in dependence to information from another data frame - the link to the last question. The solution provided by @PoGibas worked very well, however, I have to limit the calculation to only the 10 next values from data frame A after a defined time in each row. Could you please help me? My code looks as @PoGibas proposed: sapply(1:length(time), function(x) sum(df1[x, which(foo >= time[x])])) 回答1: The following code modifies your

average pairs of columns in R

阅读更多关于 average pairs of columns in R

问题 I would like to average pairs of columns in a data set, not with a moving average. I want to divide up the columns into groups of two and find the average for each pair. I present a sample data set, the desired result, and nested for-loops that return the desired result. I just thought there is likely a better way. Sorry if I have overlooked the solution in a different post. I did search here, but I did not search the internet as diligently as I usually attempt. Thank you for any advice. x =

Max date in R column with sapply

阅读更多关于 Max date in R column with sapply

问题 I am trying to use sapply to get the max date in a column but it is returning a number instead of a date. Any idea how to resolve this? I can't seem to figure out why this is occurring.. mtcars$datecolm = '2015-03-03' mtcars$datecolm[1] = '2015-09-09' mtcars$datecolm = as.Date(mtcars$datecolm) sapply(mtcars, max) # why is it returning a number instead of a date?? max(mtcars$datecolm) # works correctly Please use sapply given the way I have set this up... I know this works with apply(mtcars,2

Fill in mean values for NA in every column of a data frame [duplicate]

阅读更多关于 Fill in mean values for NA in every column of a data frame [duplicate]

问题 This question already has answers here : Replace missing values with column mean (11 answers) Closed 3 years ago . if I have a data frame df df=data.frame(x=1:20,y=c(1:10,rep(NA,10)),z=c(rep(NA,5),1:15)) I know to replace NAs with mean value for a given column is, we can use df[is.na(df$x)]=mean(df$x,na.rm=T) What I am trying to find is a way to use a single command so that it does this for the columns at once instead of repeating it for every column. Suspecting, I need to use sapply and

Processing the list of data.frames with “apply” family of functions

阅读更多关于 Processing the list of data.frames with “apply” family of functions

问题 I have a data frame which I then split into three (or any number) of dataframes. What I’m trying to do is to automatically process each column in each dataframe and add lagged versions of existing variables. For example if there were three variables in each data.frame (V1, V2, V3) I would like to automatically (without hardcoding) add V1.lag, V2.lag and V3.lag. Here is what I have so far, but I’m stuck now. Any help would be highly apprecaited. dd<-data.frame(matrix(rnorm(216),72,3),c(rep("A"

means and SD for columns in a dataframe with NA values

阅读更多关于 means and SD for columns in a dataframe with NA values

问题 I'm trying to calculate the mean and standard deviation of several columns (except the first column) in a data.frame with NA values. I've tried colMeans , sapply , etc., to create a loop that runs through the data.frame and then stores means and standard deviations in a separate table but keep getting a "FUN" error. any help would be great. Thanks a 回答1: sapply(df, function(cl) list(means=mean(cl,na.rm=TRUE), sds=sd(cl,na.rm=TRUE))) col1 col2 col3 col4 col5 means 3 8 12.5 18.25 22.5 sds 1

Using “…” and “replicate”

阅读更多关于 Using “…” and “replicate”

问题 In the documentation of sapply and replicate there is a warning regarding using ... Now, I can accept it as such, but would like to understand what is behind it. So I've created this little contrived example: innerfunction<-function(x, extrapar1=0, extrapar2=extrapar1) { cat("x:", x, ", xp1:", extrapar1, ", xp2:", extrapar2, "\n") } middlefunction<-function(x,...) { innerfunction(x,...) } outerfunction<-function(x, ...) { cat("Run middle function:\n") replicate(2, middlefunction(x,...)) cat(

Geographical distance by group - Applying a function on each pair of rows

阅读更多关于 Geographical distance by group - Applying a function on each pair of rows

问题 I want to calculate the average geographical distance between a number of houses per province. Suppose I have the following data. df1 <- data.frame(province = c(1, 1, 1, 2, 2, 2), house = c(1, 2, 3, 4, 5, 6), lat = c(-76.6, -76.5, -76.4, -75.4, -80.9, -85.7), lon = c(39.2, 39.1, 39.3, 60.8, 53.3, 40.2)) Using the geosphere library I can find the distance between two houses. For instance: library(geosphere) distm(c(df1$lon[1], df1$lat[1]), c(df1$lon[2], df1$lat[2]), fun = distHaversine) #11429

weighted means by group and column

阅读更多关于 weighted means by group and column

问题 I wish to obtain weighted means by group for each of several (actually about 60) columns. This question is very similar to: repeatedly applying ave for computing group means in a data frame just asked. I have come up with two ways to obtain the weighted means so far: use a separate sapply statement for each column place an sapply statement inside a for-loop However, I feel there must be a way to insert an apply statement inside the sapply statement or vice versa, thereby eliminating the for