Preventing scipy eigenvectors differing from computer to computer

问题

Following up on this question about how to find the Markov steady state, I'm now running into the problem that it works perfectly on my lab computer, but it doesn't work on any other computer. Specifically, it always finds the correct number of near-one eigenvalues, and thus which nodes are attractor nodes, but it doesn't consistently find all of them and they aren't grouped properly. For example, using the 64x64 transition matrix below, the computers in which it doesn't work it always produces one of three different wrong collections of attractors at random. On the smaller matrix M1 below, all computers tested get the same, correct result to attractor groups and stationary distribution.

All the machines tested are running Win7x64 and WinPython-64bit-2.7.9.4. One computer always gets it right, three others always get it wrong in the same ways. Based on several posts I found like this and this, this sounds like it might be caused by differences in the floating point accuracy of the calculations. Unfortunately I don't know how to fix this; I mean, I don't know how to alter the code for pulling the left eigenvalues from the matrix in order to force a specific accuracy that all computer can handle (and I don't think it has to be very accurate for this purpose).

That's just my current best guess for how the results could differ. If you have a better idea of why this is happening and how to stop it form happening then that's great too.

If there is a way to make scipy consistent from run-to-run and computer to computer, I don't think it would depend on the details of my method, but because it was requested, here it is. Both of the matrices have 3 attractors. In M1 the first one [1,2] is an orbit of two states, the other two [7] and [8] are equilibria. M2 is a 64x64 transition matrix with equilibria at [2] and [26] as well as an orbit using [7,8].

But instead of finding that set of attractors, it sometimes reports [[26],[2],[26]] and sometimes [[2,7,8,26],[2],[26]] and sometimes ... it's not getting the same answer each run, and it's never getting [[2],[7,8],[26]] (in any order).

import numpy as np
import scipy.linalg

M1 = np.array([[0.2, 0.8, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
              [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
              [0.6, 0.4, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
              [0.0, 0.0, 0.2, 0.0, 0.1, 0.1, 0.3, 0.3],
              [0.0, 0.0, 0.2, 0.2, 0.2, 0.2, 0.1, 0.1],
              [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.5, 0.5],
              [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0],
              [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0]])

M2 = np.genfromtxt('transitionMatrix1.csv', delimiter=',')

# For easy switching
M = M2
# Confirm the matrix is a valid Markov transition matrix
#print np.sum(M,axis=1)

The rest is the same code from the previous question, included here for your convenience.

#create a list of the left eigenvalues and a separate array of the left eigenvectors
theEigenvalues, leftEigenvectors = scipy.linalg.eig(M, right=False, left=True)  
# for stationary distribution the eigenvalues and vectors are always real, and this speeds it up a bit
theEigenvalues = theEigenvalues.real                 
#print theEigenvalues 
leftEigenvectors = leftEigenvectors.real
#print leftEigenvectors 
# set how close to zero is acceptable as being zero...1e-15 was too low to find one of the actual eigenvalues
tolerance = 1e-10
# create a filter to collect the eigenvalues that are near enough to zero                                    
mask = abs(theEigenvalues - 1) < tolerance
# apply that filter           
theEigenvalues = theEigenvalues[mask]
# filter out the eigenvectors with non-zero eigenvalues                
leftEigenvectors = leftEigenvectors[:, mask]         
# convert all the tiny and negative values to zero to isolate the actual stationary distributions
leftEigenvectors[leftEigenvectors < tolerance] = 0   
# normalize each distribution by the sum of the eigenvector columns
attractorDistributions = leftEigenvectors / leftEigenvectors.sum(axis=0, keepdims=True)   
# this checks that the vectors are actually the left eigenvectors
attractorDistributions = np.dot(attractorDistributions.T, M).T      
# convert the column vectors into row vectors (lists) for each attractor     
attractorDistributions = attractorDistributions.T                        
print attractorDistributions
# a list of the states in any attractor with the stationary distribution within THAT attractor
#theSteadyStates = np.sum(attractorDistributions, axis=1)                
#print theSteadyStates

回答1:

The unfortunate answer is that there is no way to fix the seed for scipy, and therefore no way to force it to output consistent values. This also means that there is no way for it to reliably produce correct answers because only one answer is correct. My attempts to get a definitive answer or fix from the scipy people were completely dismissed, but somebody may find some wisdom in those words when facing this issue.

As a concrete example of the problem, when you run the code above you may sometimes get the following set of eigenvectors supposedly representing the steady states of each of the attractors in the system. My home computer always produces this result (which is different form my laptop and lab computer). As stated in the question, the correct attractors are [[2],[7,8],[26]]. The equilibria of [2] and [6] are correctly identified, but the distribution for [7,8] instead returns a non-valid probability distribution over [2,26]. The correct answer is [0.19835, 0.80164] over [7,8] respectively. My lab computer correctly finds that solution, but so far six other computers have failed to do so.

What this means is that (unless there is some other unidentified error in my code) scipy.linalg is worthless for finding steady states of Markov models. Even though it works some of the time, it cannot be relied upon to provide the correct answer, and therefore should be avoided completely...at least for Markov model steady states, and probably for everything to do with eigenvectors. It just doesn't work.

I will post code on how to reliably generate the stationary distribution of a Markov model without using scipy if anybody asks a question about it. It runs a bit slower, but it's always the same and always correct.

[[ 0.          0.          0.        ]
 [ 0.          0.          0.        ]
 [ 0.25707958  1.          0.        ]
 [ 0.          0.          0.        ]
 [ 0.          0.          0.        ]
 [ 0.          0.          0.        ]
 [ 0.          0.          0.        ]
 [ 0.          0.          0.        ]
 [ 0.          0.          0.        ]
 [ 0.          0.          0.        ]
 [ 0.          0.          0.        ]
 [ 0.          0.          0.        ]
 [ 0.          0.          0.        ]
 [ 0.          0.          0.        ]
 [ 0.          0.          0.        ]
 [ 0.          0.          0.        ]
 [ 0.          0.          0.        ]
 [ 0.          0.          0.        ]
 [ 0.          0.          0.        ]
 [ 0.          0.          0.        ]
 [ 0.          0.          0.        ]
 [ 0.          0.          0.        ]
 [ 0.          0.          0.        ]
 [ 0.          0.          0.        ]
 [ 0.          0.          0.        ]
 [ 0.          0.          0.        ]
 [ 0.06867772  0.          1.        ]
 [ 0.          0.          0.        ]
 [ 0.          0.          0.        ]
 [ 0.          0.          0.        ]
...
 [ 0.          0.          0.        ]]

来源：https://stackoverflow.com/questions/34744342/preventing-scipy-eigenvectors-differing-from-computer-to-computer

标签

python

numpy

scipy

eigenvector

eigenvalue