问题
I am running a coin-toss simulation with a loop which runs about 1 million times.
Each time I run the loop I wish to retain the table output from the RLE command. Unfortunately a simple append does not seem to be appropriate. Each time I run the loop I get a slightly different amount of data which seems to be one of the sticking points.
This code gives an idea of what I am doing:
N <- 5 #Number of times to run
rlex <-NULL
#begin loop#############################
for (i in 1:N) { #tells R to repeat N number
x <-sample(0:1, 100000, 1/2)
rlex <-append(rlex, rle(x))
}
table(rlex) #doesn't work
table(rle(x)) #only 1
So instead of having five separate rle results (in this simulation, 1 million in the full version), I want one merged rle table. Hope this is clear. Obviously my actual code is a bit more complex, hence any solution should be as close to what I have specified as possible.
UPDATE: The loop is an absolute requirement. No ifs or buts. Perhaps I can pull out the table(rle(x)) data and put it into a matrix. However again the stumbling block is the fact that some of the less frequent run lengths do not always turn up in each loop. Thus I guess I am looking to conditionally fill a matrix based on the run length number?
Last update before I give up: Retaining the rle$values will mean that too much data is being retained. My simulation is large-scale and I really only wish to retain the table output of the rle. Either I retain each table(rle(x)) for each loop and combine by hand (there will be thousands), or I find a programmatic way to keep the data (yes for zeroes and ones) and have one table that is formed from merging each of the individual loops as I go along.
Either this is easyish to do, as specified, or I will not be doing it. It may seem a silly idea/request, but that should be incidental to whether it can be done.
Seriously last time. Here is an animated gif showing what I expect to happen.
After each iteration of the loop data is added to the table. This is as clear as I am going to be able to communicate it.
回答1:
OK, attempt number 4:
N <- 5
set.seed(1)
x <- NULL
for (i in 1:N){
x <- rbind(x, table(rle(sample(0:1, 100000, replace=TRUE))))
}
x <- as.data.frame(x)
x$length <- as.numeric(rownames(x))
aggregate(x[, 1:2], list(x[[3]]), sum)
Produces:
Group.1 0 1
1 1 62634 62531
2 2 31410 31577
3 3 15748 15488
4 4 7604 7876
5 5 3912 3845
6 6 1968 1951
7 7 979 971
8 8 498 477
9 9 227 246
10 10 109 128
11 11 65 59
12 12 24 30
13 13 21 11
14 14 7 10
15 15 0 4
16 16 4 2
17 17 0 1
18 18 0 1
If you want the aggregation inside the loop, do:
N <- 5
set.seed(1)
x <- NULL
for (i in 1:N){
x <- rbind(x, table(rle(sample(0:1, 100000, replace=TRUE))))
y <- aggregate(x, list(as.numeric(rownames(x))), sum)
print(y)
}
回答2:
Following up @CarlWitthoft's answer, you probably want:
N <- 5
rlex <-NULL
for (i in 1:N) {
x <-sample(0:1, 100000, 1/2)
rlex <-append(rlex, rle(x)$lengths)
}
since I think you don't care about the $values
component (i.e. whether each run is a run of zeros or ones).
Result: one long vector of run lengths.
But this would probably be a lot more efficient:
maxlen <- 30
rlemat <- matrix(nrow=N,ncol=maxlen)
for (i in 1:N) {
x <-sample(0:1, 100000, 1/2)
rlemat[i,] <- table(factor(rle(x)$lengths,levels=1:maxlen))
}
Result: an N
by maxlen
table of run lengths from each iteration.
If you only want to save the total number of runs of each length you could try:
rlecumsum <- rep(0,maxlen)
for (i in 1:N) {
x <-sample(0:1, 100000, 1/2)
rlecumsum <- rlecumsum + table(factor(rle(x)$lengths,levels=1:maxlen))
}
Result: an vector of length maxlen
of the total numbers of run lengths across all iterations.
And here's my final answer:
rlecumtab <- matrix(0,ncol=2,nrow=maxlen)
for (i in 1:N) {
x <- sample(0:1, 100000, 1/2)
r1 <- rle(x)
rtab <- table(factor(r1$lengths,levels=1:maxlen),r1$values)
rlecumtab <- rlecumtab + rtab
}
Result: a maxlen
by 2 table of the total numbers of run lengths across all iterations, divided by type (0-run vs 1-run).
回答3:
You need to read the help page for rle
. Consider:
names(rlex) #"lengths" "values" "lengths" "values" .... and so on
In the meantime, I strongly suggest you spend some time reading up on statistical methods. There is zero (+/- epsilon) chance that running a binomial simulation a million times will tell you anything you won't learn after a few hundred tries, unless your coin has p=1e-5 :-).
来源:https://stackoverflow.com/questions/12892985/append-rle-result-from-loop