I have created a tf-idf matrix but now I want to retrieve top 2 words for each document. I want to pass document id and it should give me the top 2 words.
Right now,
By doing
t = test_v.fit_transform(d.values())
you lose any link to the document ids. A dict is not ordered so you have no idea which value is given in which order. The means that before passing the values to the fit_transform function you need to record which value corresponds to which id.
For example what you can do is:
counter = 0
values = []
key = {}
for k,v in d.items():
values.append(v)
key[k] = counter
counter+=1
t = test_v.fit_transform(values)
From there you can build a function to access this matix by document id:
def get_doc_row(docid):
rowid = key[docid]
row = t[rowid,:]
return row