问题
I am having problems with hd.eigen
in Rfast
. It gives extremely close results to eigen
with most data, but sometimes hd.eign returns an empty $vector
, NAs, or other undesirable results.
For example:
> set.seed(123)
> bigm <- matrix(rnorm(2000*2000,mean=0,sd = 3), 2000, 2000)
>
> e3 = eigen(bigm)
> length(e3$values)
[1] 2000
> length(e3$vectors)
[1] 4000000
> sum(is.na(e3$vectors) == TRUE)
[1] 0
> sum(is.na(e3$vectors) == FALSE)
[1] 4000000
>
> e4 = hd.eigen(bigm, vectors = TRUE)
> length(e4$values)
[1] 2000
> length(e4$vectors)
[1] 4000000
> sum(is.na(e4$vectors) == TRUE)
[1] 2000
> sum(is.na(e4$vectors) == FALSE)
[1] 3998000
Other than the fact that it breaks my script, do these NAs indicate a deeper problem with my data? Or is hd.eig
not able to handle some situations that the stock eigen()
can? Is one better than the other?
Edit: As per Ralf's suggestion, I checked my BLAS versions, and it does seem like maybe R is looking for the wrong version/in the wrong place:
~ $ ldd /usr/lib64/R/bin/exec/R
linux-vdso.so.1 (0x00007ffeec3b9000)
libR.so => not found
libRblas.so => not found
libgomp.so.1 => /usr/lib64/libgomp.so.1 (0x00007feb27ef2000)
libpthread.so.0 => /usr/lib64/libpthread.so.0 (0x00007feb27ecf000)
libc.so.6 => /usr/lib64/libc.so.6 (0x00007feb27cdb000)
/lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007feb27f7b000)
Also, I am unclear on whether openBLAS is equivalent to the BLAS that is installed by default in other distros.
> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-generic-linux-gnu (64-bit)
Running under: Clear Linux OS
Matrix products: default
BLAS/LAPACK: /usr/lib64/libopenblas_nehalemp-r0.3.6.so
edit 2: I tried the same example on a CentOS-based HPC system, and did not get any NA's. There, sessionInfo()
reveals:
BLAS/LAPACK: /hpc/packages/minerva-centos7/intel/parallel_studio_xe_2019/compilers_and_libraries_2019.0.117/linux/mkl/lib/intel64_lin/libmkl_gf_lp64.so
Edit 3: The expression in hd.eign
that produces the NAs is
vectors <- tcrossprod(y, t(FF) * L^(-0.5))
specifically, L^(-0.5)
produces NaN at index 2000
> L[2000]
[1] -1.136237e-12
However, on the two machines where no NAs are returned L[2000] is positive (although slightly different, 5.822884e-14
on the HPC system and 3.022511e-12
on my Windows machine running Microsoft build of R.
Edit 4: The difference appears to originate in the base eigen()
function, which returns one negative value from the crossprod()
matrix xx
on the problem machine but not not the other two. I saved the xx
object and opened between computers, so I know that the input to eigen()
was exactly the same.
Edit 5: I drilled one level deeper and found that the original negative value comes from this statement in eigen()
z <- if (!complex.x)
.Internal(La_rs(x, only.values))
else .Internal(La_rs_cmplx(x, only.values))
Edit 6: If I save as a CSV and then re-open, the problem computer does not produce negative eigenvalues.
> load("/home/james/nfs-cloud/PanosLab/CircRNA/input_to_La_rs.Rdata")
> r <- .Internal(La_rs(as.matrix(x), only.values = FALSE))
> sum(r$values < 0)
[1] 1
> write.csv(x, "test_for_internal.csv", row.names = FALSE)
> x <- read.csv("test_for_internal.csv")
> r <- .Internal(La_rs(as.matrix(x), only.values = FALSE))
> sum(r$values < 0)
[1] 0
Does that give anyone a clue? Is this a bug?
回答1:
The hd.eigen function in Rfast is designed for the case of n smaller than p only.
来源:https://stackoverflow.com/questions/57935571/rfast-hd-eigen-returns-nas-but-base-eigen-does-not