Using StringToWordVector in Weka with internal data structures
问题 I am trying to obtain document clustering using Weka. The process is a part of a larger pipeline, and I really can't afford to write out arff files. I have all the documents and the bag of words in each document as a Map<String, Multiset<String>> structure, where the keys are document names, and the Multiset<String> values are the bags of words in the documents. I have two questions, really: (1) Current approach ends up clustering terms, not documents: public final Instances