Second-order cooccurrence of terms in texts
问题 Basically, I want to reimplement this video. Given a corpus of documents, I want to find the terms that are most similar to each other. I was able to generate a cooccurrence matrix using this SO thread and use the video to generate an association matrix. Next I, would like to generate a second order cooccurrence matrix. Problem statement: Consider a matrix where the rows of the matrix correspond to a term and the entries in the rows correspond to the top k terms similar to that term. Say, k =