问题
If the sum is 1, I could just divide the values by their sum. However, this approach is not applicable when the sum is 0.
Maybe I could compute the opposite of each value I sample, so I would always have a pair of numbers, such that their sum is 0. However this approach reduces the "randomness" I would like to have in my random array.
Are there better approaches?
Edit: the array length can vary (from 3 to few hundreds), but it has to be fixed before sampling.
回答1:
You could use sklearns Standardscaler. It scales your data to have a variance of 1 and a mean of 0. The mean of 0 is equivalent to a sum of 0.
from sklearn.preprocessing import StandardScaler, MinMaxScaler
import numpy as np
rand_numbers = StandardScaler().fit_transform(np.random.rand(100,1, ))
If you don't want to use sklearn you can standardize by hand, the formula is pretty simple:
rand_numbers = np.random.rand(1000,1, )
rand_numbers = (rand_numbers - np.mean(rand_numbers)) / np.std(rand_numbers)
The problem here is the variance of 1, that causes numbers greater than 1 or smaller than -1. Therefor you devide the array by its max abs value.
rand_numbers = rand_numbers*(1/max(abs(rand_numbers)))
Now you have an array with values between -1 and 1 with a sum really close to zero.
print(sum(rand_numbers))
print(min(rand_numbers))
print(max(rand_numbers))
Output:
[-1.51822999e-14]
[-0.99356294]
[1.]
What you will have with this solution is either one 1 or one -1 in your data allways. If you would want to avoid this you could add a positive random factor to the division through the max abs. rand_numbers*(1/(max(abs(rand_numbers))+randomfactor))
Edit
As @KarlKnechtel mentioned the division by the standard deviation is redundant with the division by max absolute value.
The above can be simply done by:
rand_numbers = np.random.rand(100000,1, )
rand_numbers = rand_numbers - np.mean(rand_numbers)
rand_numbers = rand_numbers / max(abs(rand_numbers))
回答2:
I would try the following solution:
def draw_randoms_while_sum_not_zero(eps):
r = random.uniform(-1, 1)
sum = r
yield r
while abs(sum) > eps:
if sum > 0:
r = random.uniform(-1, 0)
else:
r = random.uniform(0,1)
sum += r
yield r
As the floating point numbers are not perfectly accurate, you can never be sure, that the numbers you'd draw might sum up to 0. You need to decide, what margin is acceptable and call the above generator.
It'll yield (lazily return) random numbers as you need them as long as they don't sum up to 0 ± eps
epss = [0.1, 0.01, 0.001, 0.0001, 0.00001]
for eps in epss:
lengths = []
for _ in range(100):
lengths.append(len(list(draw_randoms_while_sum_not_zero(eps))))
print(f'{eps}: min={min(lengths)}, max={max(lengths)}, avg={sum(lengths)/len(lengths)}')
Results:
0.1: min=1, max=24, avg=6.1
0.01: min=1, max=174, avg=49.27
0.001: min=4, max=2837, avg=421.41
0.0001: min=5, max=21830, avg=4486.51
1e-05: min=183, max=226286, avg=48754.42
回答3:
Since you are fine with the approach of generating lots of numbers and dividing by the sum, why not generate n/2 positive numbers divide by sum. Then generate n/2 negative numbers and divide by sum?
Want a random positive to negative mix? Randomly generate that mix randomly first then continue.
回答4:
One way to generate such list is by having the opposite number. If that is not a desirable property, you can introduce some extra randomness by adding / subtracting the same random value to different opposite couples, e.g.:
def exact_sum_uniform_random(num, min_val=-1.0, max_val=1.0, epsilon=0.1):
items = [random.uniform(min_val, max_val) for _ in range(num // 2)]
opposites = [-x for x in items]
if num % 2 != 0:
items.append(0.0)
for i in range(len(items)):
diff = random.random() * epsilon
if items[i] + diff <= max_val \
and any(opposite - diff >= min_val for opposite in opposites):
items[i] += diff
modified = False
while not modified:
j = random.randint(0, num // 2 - 1)
if opposites[j] - diff >= min_val:
opposites[j] -= diff
modified = True
result = items + opposites
random.shuffle(result)
return result
random.seed(0)
x = exact_sum_uniform_random(3)
print(x, sum(x))
# [0.7646391433441265, -0.7686875811622043, 0.004048437818077755] 2.2551405187698492e-17
EDIT
If the upper and lower limits are not strict, a simple way to construct a zero sum sequence is to sum-normalize two separate sequences to 1 and -1 and join them together:
def norm(items, scale):
return [item / scale for item in items]
def zero_sum_uniform_random(num, min_val=-1.0, max_val=1.0):
a = [random.uniform(min_val, max_val) for _ in range(num // 2)]
a = norm(a, sum(a))
b = [random.uniform(min_val, max_val) for _ in range(num - len(a))]
b = norm(b, -sum(b))
result = a + b
random.shuffle(result)
return result
random.seed(0)
n = 3
x = exact_mean_uniform_random(n)
print(exact_mean_uniform_random(n), sum(x))
# [1.0, 2.2578843364303585, -3.2578843364303585] 0.0
Note that both approaches will not have, in general, a uniform distribution.
来源:https://stackoverflow.com/questions/58407760/how-to-generate-random-values-in-range-1-1-such-that-the-total-sum-is-0