tl;dr: I'm looking for methods to implement a weighted random choice based on the relative magnitude of values (or functions of values) in an array in golang. Are there standard algorithms or recommendable packages for this? Is so how do they scale?
Goals
I'm trying to write 2D and 3D markov process programs in golang. A simple 2D example of such is the following: Imagine one has a lattice, and on each site labeled by index (i,j) there are n(i,j) particles. At each time step, the program chooses a site and moves one particle from this site to a random adjacent site. The probability that a site is chosen is proportional to its population n(i,j) at that time.
Current Implementation
My current algorithm, e.g. for the 2D case on an L x L lattice, is the following:
- Convert the starting array into a slice of length L^2 by concatenating rows in order, e.g.
cdfpop[i L +j]=initialpopulation[i][j]
. - Convert the 1D slice into a cdf by running a for loop over
cdfpop[i]+=cdfpop[i-1]
. - Generate a two random numbers,
Rsite
whose range is from 1 to the largest value in the cdf (this is just the last value,cdfpop[L^2-1]
), andRhop
whose range is between 1 and 4. The first random number chooses a weighted random site, and the second number a random direction to hop in - Use a binary search to find the leftmost index
indexhop
ofcdfpop
that is greater thanRsite
. The index being hopped to is eitherindexhop +-1
for x direction hops orindexhop +- L
for y direction hops. - Finally, directly change the values of
cdfpop
to reflect the hop process. This means subtracting one from (adding one to) all values incdfpop
between the index being hopped from (to) and the index being hopped to (from) depending on order. - Rinse and repeat in for loop. At the end reverse the cdf to determine the final population.
Edit: Requested Pseudocode looks like:
main(){
//import population LxL array
population:= import(population array)
//turn array into slice
for i number of rows{
cdf[ith slice of length L] = population[ith row]
}
//compute cumulant array
for i number of total sites{
cdf[i] = cdf[i-1]+cdf[i]
}
for i timesteps{
site = Randomhopsite(cdf)
cdf = Dohop(cdf, site)
}
Convertcdftoarrayandsave(cdf)
}
Randomhopsite(cdf) site{
//Choose random number in range of the cummulant
randomnumber=RandomNumber(Range 1 to Max(cdf))
site = binarysearch(cdf) // finds leftmost index such that
// cdf[i] > random number
return site
}
Dohop(cdf,site) cdf{
//choose random hop direction and calculate coordinate
randomnumber=RandomNumber(Range 1 to 4)
case{
randomnumber=1 { finalsite= site +1}
randomnumber=2 { finalsite= site -1}
randomnumber=3 { finalsite= site + L}
randomnumber=4 { finalsite= site - L}
}
//change the value of the cumulant distribution to reflect change
if finalsite > site{
for i between site and finalsite{
cdf[i]--
}
elseif finalsite < site{
for i between finalsite and site{
cdf[i]++
}
else {error: something failed}
return cdf
}
This process works really well for simple problems. For this particular problem, I can run about 1 trillion steps on a 1000x 1000 lattice in about 2 minutes on average with my current set up, and I can compile population data to gifs every 10000 or so steps by spinning a go routine without a huge slowdown.
Where efficiency breaks down
The trouble comes when I want to add different processes, with real-valued coefficients, whose rates are not proportional to site population. So say I now have a hopping rate at k_hop *n(i,j) and a death rate (where I simply remove a particle) at k_death *(n(i,j))^2. There are two slow-downs in this case:
- My cdf will be double the size (not that big of a deal). It will be real valued and created by
cdfpop[i*L+j]= 4 *k_hop * pop[i][j]
fori*L+j<L*L
andcdfpop[i*L+j]= k_death*math. Power(pop[i][j],2)
forL*L<=i*L+j<2*L*L
, followed bycdfpop[i]+=cdfpop[i-1]
. I would then select a random real in the range of the cdf. - Because of the squared n, I will have to dynamically recalculate the part of the cdf associated with the death process weights at each step. This is a MAJOR slow down, as expected. Timing for this is about 3 microseconds compared with the original algorithm which took less than a nanosecond.
This problem only gets worse if I have rates calculated as a function of populations on neighboring sites -- e.g. spontaneous particle creation depends on the product of populations on neighboring sites. While I hope to work out a way to just modify the cdf without recalculation by thinking really hard, as I try to simulate problems of increasing complexity, I can't help but wonder if there is a universal solution with reasonable efficiency I'm missing that doesn't require specialized code for each random process.
Thanks for reading!
来源:https://stackoverflow.com/questions/52442838/is-there-an-elegant-and-efficient-way-to-implement-weighted-random-choices-in-go