I am working on building a transition matrix for implementing the PageRank algorithm. How could I use numpy to make sure that the columns add up to one.
For example:
Divide the elements of each column by their column-summations -
a/a.sum(axis=0,keepdims=1) # or simply : a/a.sum(0)
For making the row-summations unity, change the axis input -
a/a.sum(axis=1,keepdims=1)
Sample run -
In [78]: a = np.random.rand(4,5)
In [79]: a
Out[79]:
array([[ 0.37, 0.74, 0.36, 0.41, 0.44],
[ 0.51, 0.86, 0.91, 0.03, 0.76],
[ 0.56, 0.46, 0.01, 0.86, 0.38],
[ 0.72, 0.66, 0.56, 0.84, 0.69]])
In [80]: b = a/a.sum(axis=0,keepdims=1)
In [81]: b.sum(0) # Verify
Out[81]: array([ 1., 1., 1., 1., 1.])
To make sure it works on int
arrays as well for Python 2.x, use from __future__ import division
or use np.true_divide.
For columns adding upto 0
For columns that add upto 0
, assuming that we are okay with keeping them as they are, we can set the summations to 1
, rather than divide by 0
, like so -
sums = a.sum(axis=0,keepdims=1);
sums[sums==0] = 1
out = a/sums
for i in range(len(A[0])):
col_sum = A[:, i].sum()
if col_sum != 0:
A[:, i] = A[:, i]/col_sum
else:
pass
The for loop
is a bit sloppy and I'm sure there's a much more elegant way but it works.
Replace pass
with A[:, i] = 1/len(A[0])
to eliminate dangling nodes and make the matrix column stocastic.