I am familiar with using BOW features for text classification, wherein we first find the size of the vocabulary for the corpus which becomes the size of our feature vector. For
To get a fixed length feature vector for each sentence, although the number of words in each sentence is different, do as follows:
below is the code snipet
def getWordVecs(words, w2v_dict):
vecs = []
for word in words:
word = word.replace('\n', '')
try:
vecs.append(w2v_model[word].reshape((1,300)))
except KeyError:
continue
vecs = np.concatenate(vecs)
vecs = np.array(vecs, dtype='float')
final_vec = np.sum(vecs, axis=0)
return final_vec
words is list of tokens obtained after tokenizing a sentence.