问题
So I am trying to train a very large feature array for hidden markov model: 700 x (400 x 4122), where each 400x4122 mini-array is a sequence of observed samples across 400 time stamps with 4122 features. There is a total of 700 such sequences, which amounts to ~45GB of memory, when concatenated. My question is: how do you work with array of this size?
In the hmmlearn python package, one typically work with multiple sequences as follows:
x1 -> a 400x4122 sequence
x2 -> another 400x4122 sequence
...
xn -> 700th 400x4122 sequence
X = np.concatenate(x1, x2, ..., xn)
lengths = [len(x1), len(x2),..., len(xn)]
model = GaussianHMM(n_component = 6, ...).fit(X, length = lengths)
In other words, one needs to concatenate the entire array of sequences and feed into the training function. However, I was wondering if there is a way to feed one 400x4122 sequence at a time, as the entire concatenated array is way too large to work with.
Thanks in advance.
来源:https://stackoverflow.com/questions/40294642/python-passing-multiple-large-sequences-through-hmmlearn