ngram representation and distance matrix in R
问题 Assume that we have this data: a <- c("ham","bamm","comb") for 1-gram, this is the matrix representation of the above list. # h a m b c o # 1 1 1 0 0 0 # 0 1 2 1 0 0 # 0 0 1 1 1 1 I know that table(strsplit(a,split = "")[i]) for i in 1:length(a) will give the separated count for each of them. But I don't know how use rbind to make them as a whole since the lengths and column names are different. After that, I want to use either Euclidean or Manhattan distance to find the similarity matrix for