I know that R works most efficiently with vectors and looping should be avoided. I am having a hard time teaching myself to actually write code this way. I would like some ideas
Clearly I should have worked on this for another hour before I posted my question. It's so obvious in retrospect. :)
To use R's vector logic I took out the loop and replaced it with this:
st <- sample(c(12,17,24),10000,prob=c(20,30,50),replace=TRUE)
p1 <- sample(c(12,17,24),10000,prob=c(20,30,50),replace=TRUE)
p2 <- sample(c(12,17,24),10000,prob=c(20,30,50),replace=TRUE)
year <- rep(1991:2000,1000)
I can now do 100,000 samples almost instantaneous. I knew that vectors were faster, but dang. I presume 100,000 loops would have taken over an hour using a loop and the vector approach takes <1 second. Just for kicks I made the vectors a million. It took ~2 seconds to complete. Since I must test to failure, I tried 10mm but ran out of memory on my 2GB laptop. I switched over to my Vista 64 desktop with 6GB ram and created vectors of length 10mm in 17 seconds. 100mm made things fall apart as one of the vectors was over 763mb which resulted in an allocation issue with R.
Vectors in R are amazingly fast to me. I guess that's why I am an economist and not a computer scientist.
To answer your question about why the loop of 10000 took much longer than your loop of 1000:
I think the primary suspect is the concatenations that are happening every loop. As the data gets longer R is probably copying every element of the vector into a new vector that is one longer. Copying a small (500 elements on average) data set 1000 times is fast. Copying a larger (5000 elements on average) data set 10000 times is slower.