问题
Using R, what is the best way to read a symmetric matrix from a file that omits the upper triangular part. For example,
1.000
.505 1.000
.569 .422 1.000
.602 .467 .926 1.000
.621 .482 .877 .874 1.000
.603 .450 .878 .894 .937 1.000
I have tried read.table
, but haven't been successful.
回答1:
Here's a read.table and loopless and *apply-less solution:
txt <- "1.000
.505 1.000
.569 .422 1.000
.602 .467 .926 1.000
.621 .482 .877 .874 1.000
.603 .450 .878 .894 .937 1.000"
# Could use clipboard or read this from a file as well.
mat <- data.matrix( read.table(text=txt, fill=TRUE, col.names=paste("V", 1:6)) )
mat[upper.tri(mat)] <- t(mat)[upper.tri(mat)]
> mat
V1 V2 V3 V4 V5 V6
[1,] 1.000 0.505 0.569 0.602 0.621 0.603
[2,] 0.505 1.000 0.422 0.467 0.482 0.450
[3,] 0.569 0.422 1.000 0.926 0.877 0.878
[4,] 0.602 0.467 0.926 1.000 0.874 0.894
[5,] 0.621 0.482 0.877 0.874 1.000 0.937
[6,] 0.603 0.450 0.878 0.894 0.937 1.000
回答2:
I copied your text, and then used tt <- file('clipboard','rt')
to import it. For a standard file:
tt <- file("yourfile.txt",'rt')
a <- readLines(tt)
b <- strsplit(a," ") #insert delimiter here; can use regex
b <- lapply(b,function(x) {
x <- as.numeric(x)
length(x) <- max(unlist(lapply(b,length)));
return(x)
})
b <- do.call(rbind,b)
b[is.na(b)] <- 0
#kinda kludgy way to get the symmetric matrix
b <- b + t(b) - diag(b[1,1],nrow=dim(b)[1],ncol=dim(b)[2]
回答3:
I'm posting but I like Blue Magister's approach wat better. But maybe there's something in this that's of use.
mat <- readLines(n=6)
1.000
.505 1.000
.569 .422 1.000
.602 .467 .926 1.000
.621 .482 .877 .874 1.000
.603 .450 .878 .894 .937 1.000
nmat <- lapply(mat, function(x) unlist(strsplit(x, "\\s+")))
lens <- sapply(nmat, length)
dlen <- max(lens) -lens
bmat <- lapply(seq_along(nmat), function(i) {
as.numeric(c(nmat[[i]], rep(NA, dlen[i])))
})
mat <- do.call(rbind, bmat)
mat[upper.tri(mat)] <- t(mat)[upper.tri(mat)]
mat
回答4:
Here is an approach which also works if the dimensions of the matrix are unknown.
# read file as a vector
mat <- scan("file.txt", what = numeric())
# calculate the number of columns (and rows)
ncol <- (sqrt(8 * length(mat) + 1) - 1) / 2
# index of the diagonal values
diag_idx <- cumsum(seq.int(ncol))
# generate split index
split_idx <- cummax(sequence(seq.int(ncol)))
split_idx[diag_idx] <- split_idx[diag_idx] - 1
# split vector into list of rows
splitted_rows <- split(mat, f = split_idx)
# generate matrix
mat_full <- suppressWarnings(do.call(rbind, splitted_rows))
mat_full[upper.tri(mat_full)] <- t(mat_full)[upper.tri(mat_full)]
[,1] [,2] [,3] [,4] [,5] [,6]
0 1.000 0.505 0.569 0.602 0.621 0.603
1 0.505 1.000 0.422 0.467 0.482 0.450
2 0.569 0.422 1.000 0.926 0.877 0.878
3 0.602 0.467 0.926 1.000 0.874 0.894
4 0.621 0.482 0.877 0.874 1.000 0.937
5 0.603 0.450 0.878 0.894 0.937 1.000
来源:https://stackoverflow.com/questions/13863569/reading-a-symmetric-matrix-from-file-that-omits-upper-triangular-part