What is the reason for bizarre R-igraph-spinglass result after several iterations?

烈酒焚心 提交于 2021-01-28 05:03:09

问题


I have used the spinglass algorithm in igraph to analyze clustering in networks. Since spinglass uses a stochastic method, the results can differ somewhat if we run the algorithm several times on one network. That's to be expected of course.

However, I ran into one thing that I cannot explain.

When we run spinglass on the same network-structure 1000 times, in 10 replications (so 10x1000 times), about 90% of the runs in replications 1-9 return 2 clusters:

We run the spinglass estimation 1000 times on the network (see link for picture) and then look which node-cluster constellation has been estimated most frequently across these 1000 estimations. We did this in order to deal with the stochastic variability in the algorithm. In the case in the picture, two clusters were estimated in 93.4% of all 1000 estimations. Such a result is retrieved in 9 of 10 replications of such repeated estimations.

However, about 90% of runs in replication 10 return 3 clusters.

I.e., in 1 of these 10 replications (see link for picture) we retrieve the following result as the most frequently estimated node-cluster constellation (93.1% of 1000 spinglass runs)

This behavior has been replicated across different structures & models & replications (see below for the code for a reproducible example).

It seems like igraph/spinglass has some kind of sequential (rather than random) internal seed counter that results in subsequent iterations not being independent of each other (see also https://lists.nongnu.org/archive/html/igraph-help/2010-02/msg00031.html).

Do you have any idea what this might be?

And even if spinglass uses such a seed, why would a particular seed result in such a bizarre (and rather unlikely) result, considering the clear clustering of the network?

Thank you all for your time and help!

Below you can find the code to run the same kind of iterated estimations (be aware that it takes about 40min-1h to run due to the number of estimations; we need so many iterations and estimations because the bizarre result only comes up after several, i.e., 10 iterations of 1000 estimations). In addition to running iterated spinglass estimations, this code also relabels the cluster-node-membership to make those comparable across different estimations (since spinglass can assign a 1 to a cluster in one estimation but labels it 2 in the subsequent estimation).

# create network
net <- matrix(0,18,18)
net[1:9,1:9] <- 0.1
net[10:18,10:18] <- 0.1
net[1,10] <- net[10,1] <- 0.1
net[2,11] <- net[11,2] <- 0.1
diag(net) <- 0

# Draw:
library("qgraph")
qgraph(net, layout = "spring", title = "True network", esize = 2, vsize = 5)

# Generate data and bootstrap
library("bootnet")

# Setseed for simulation of network
#set.seed(1)
# setwd!
setwd("C:/")

Data <- ggmGenerator()(500, net)
n1 <- qgraph(cor_auto(Data), graph="EBICglasso", sampleSize=500, 
layout="spring")

# SPINGLASS PLOTTING FUNCTION
# the number of spins is now equal to the number of nodes, which is logical
spinComRec <- function(graphQgraph, # qgraph object
                   numberSpins = graphQgraph$graphAttributes$Graph$nNodes, 
                   numberIterations = 10, 
                   numberEstimations = 1000                      
){

library(igraph)
graphIgraph <- as.igraph(graphQgraph)

for (yIt in 1:numberIterations){
# Repeated estimation with stable number of spins
#set.seed(3+yIt)
clusterStabilitySpin <- list()
for (i in 1:numberEstimations){
   #set.seed(3+i)
  clusterStabilitySpin[[i]] <- cluster_spinglass(graphIgraph, 
                                                 weights = E(graphIgraph)$weight,
                                                 spins = numberSpins,
                                                 start.temp = 1,
                                                 stop.temp = 0.01,
                                                 cool.fact = 0.99)  # default settings
  # we do not specify gamma but it should be 1 which = missing/non-missing equally important
  # also, they weights are taken into consideration
  # we do not use the version that takes negative edges into account as
  # indicating edges between different communities because that is not
  # logical for psychological networks

}
# how many clusters were estimated in each estimation?
clusterStabilitySpinNumber <- c()
for (i in 1:length(clusterStabilitySpin)){
  clusterStabilitySpinNumber[i] <- length(clusterStabilitySpin[[i]])
}


### Spinglass Recoding

# Checking whether algorithm switches community-node assignment 
# (within one and the same spin number)

# compare each node membership with each other node membership
testVector <- c()
testMatrix <- matrix(ncol = graphQgraph$graphAttributes$Graph$nNodes, nrow = graphQgraph$graphAttributes$Graph$nNodes)
testList <- list()
for (k in length(clusterStabilitySpin):1){
  for (j in length(clusterStabilitySpin[[k]]$membership):1){
    for (i in length(clusterStabilitySpin[[k]]$membership):1){
      testVector[i] <- clusterStabilitySpin[[k]]$membership[j] == clusterStabilitySpin[[k]]$membership[i]
    }
    testMatrix[,j] <- testVector
  }
  testList[[k]] <- testMatrix
}

# testlist represents each node compared to all others (logical values 
# indicating whether nodes belong to the same cluster)

newMatrix <- list()
for (l in 1:length(testList)){
  newMatrix[[l]] <- unique(testList[[l]], MARGIN = 2)
}

# now we took all unique logical vectors (we only have vectors that 
# logically indicate clusters) - each column in newmatrix represents one cluster
# newMatrix holds all unique vectors

# the following substitutes each TRUE for its respective column number for each
# list element in newMatrix and thereby recodes membership labels

membershipVector <- c()
membershipMatrix <- matrix(nrow = graphQgraph$graphAttributes$Graph$nNodes, ncol = length(clusterStabilitySpin))
for (m in 1:length(newMatrix)){
  for (n in 1:ncol(newMatrix[[m]])){
    for (o in 1:length(which(newMatrix[[m]][,n]))){
      membershipVector[which(newMatrix[[m]][,n])[o]] <- n
    }
  }
  membershipMatrix[,m] <- membershipVector
}


# Loop that compares all logical matrices to all matrices
identMatrix <- matrix(ncol = length(newMatrix), nrow = length(newMatrix)) 
for (i in 1:length(newMatrix)){
  for (j in 1:length(newMatrix)){
    identMatrix[i,j] <- identical(newMatrix[[i]], newMatrix[[j]])
  }
}

uniqIdentMatrix <- unique(identMatrix, MARGIN = 2) 
# extracting unique comparisons (i.e., matrices that are equal to each other 
# (indicated by the row number)


# See how many comparable to each other
equalMatrixList <- list()
for (k in 1:ncol(uniqIdentMatrix)){
  equalMatrixList[[k]] <- which(uniqIdentMatrix[,k]==TRUE)
}
# equalMatrixList # each list element lists the indexes of those matrices that are equal to each other
# sapply(equalMatrixList, length)
# which(sapply(equalMatrixList, length)==max(sapply(equalMatrixList, length))) 
# for each of the unique memberships we get the frequency and the one with the most frequent est

uniqueMembershipVectors <- matrix(nrow = nrow(membershipMatrix), ncol = length(equalMatrixList))
for (i in 1:length(equalMatrixList)){
  uniqueMembershipVectors[,i] <- as.matrix(membershipMatrix[,equalMatrixList[[i]]])[,1]
}

# order the membership estimations according to frequency
equalMatrixList <- equalMatrixList[order(sapply(equalMatrixList,length),decreasing=T)]


# run one spinglass estimation to use as object to put in the new membershipvectors
communitySGPlot1 <- cluster_spinglass(graphIgraph, 
                                      weights = E(graphIgraph)$weight,
                                      spins = 2)

# create a pdf for each iteration with a network on each page 
# the network shows the communities, the number of communities, and the proportion
# of all estimations in which this constellation has come up
# the plotting starts with the estimation that has come up most frequently
lalaName <- paste("AllSpinglassCommunities", yIt, ".pdf", sep = "")

pdf(lalaName, width=10, height=10)
for (i in 1:ncol(uniqueMembershipVectors)){
  communitySGPlot1$membership <- uniqueMembershipVectors[,i]
  plot(communitySGPlot1, graphIgraph, layout = graphQgraph$layout)
  text(1,1,labels = length(unique(uniqueMembershipVectors[,i])))
  text(1,1.1, labels = (sapply(equalMatrixList, length)[i])/length(clusterStabilitySpin))
}
dev.off()

# same as above but next to each other on one page (i.e., in groups of 5)
lalaName <- paste("AllSpinglassCommunitiesOnepage", yIt, ".pdf", sep = "")

pdf(lalaName, width = 200, height = 200)
par(mfrow = c(5,5))
for (i in 1:ncol(uniqueMembershipVectors)){
  communitySGPlot1$membership <- uniqueMembershipVectors[,i]
  plot(communitySGPlot1, graphIgraph, layout = graphQgraph$layout)
  text(1,1,labels = length(unique(uniqueMembershipVectors[,i])))
  text(1,1.1, labels = (sapply(equalMatrixList, length)[i])/length(clusterStabilitySpin))
}
dev.off()
}
}

spinComRec(n1, numberIterations = 10, numberEstimations = 1000)

来源:https://stackoverflow.com/questions/49321028/what-is-the-reason-for-bizarre-r-igraph-spinglass-result-after-several-iteration

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!