Multi-armed bandits with Rcpp
问题 I am translating the epsilon-greedy algorithm for multiarmed bandits from here. This is a rather nice demonstration of the power and elegance of Rcpp. However, the results from this version do not tally with the one that is mentioned in the link above. I am aware that this is probably a very niche question but have no other venue to post this on! A summary of the code is as follows. Basically, we have a set of arms, each of which pays out a reward with a pre-defined probability and our job is