I have a large dataset of around 200000
data points where each data point contains 132
features. So basically my dataset is 200000 x 132
.
I have done all the computations by using the armadillo framework. However, I have tried to do PCA analysis but I received a memory error which I don't know that it's because of my RAM memory( 8 GB of Ram ) or its a limitation due to the framework itself.
I receive the following error : requested size is too large
.
Can you recommend me another framework for PCA computation which doesn't have size/memory limtations?
Or if you have previously used armadillo for PCA computation and encountered this issue, can you tell me how you managed to solve it?
You probably need to enable the use of 64 bit integers within Armadillo, which are used for storing the total number of elements, etc.
Specifically, edit the file
include/armadillo_bits/config.hpp
and uncomment the line with: // #define ARMA_64BIT_WORD
.
In version 3.4 this should be near line 59.
Alternatively, you can define ARMA_64BIT_WORD before including the Armadillo header in your program, eg:
#define ARMA_64BIT_WORD
#include <armadillo>
#include <iostream>
...
Note that your C++ compiler must be able to handle 64 bit integers. Most compilers these days have it.
来源:https://stackoverflow.com/questions/13480410/c-framework-for-computing-pca-other-than-armadillo