Calculating cosine similarity by featurizing the text into vector using tf-idf
问题 I'm new to Apache Spark, want to find the similar text from a bunch of text, have tried myself as follows - I have 2 RDD- 1st RDD contain incomplete text as follows - [0,541 Suite 204, Redwood City, CA 94063] [1,6649 N Blue Gum St, New Orleans,LA, 70116] [2,#69, Los Angeles, Los Angeles, CA, 90034] [3,98 Connecticut Ave Nw, Chagrin Falls] [4,56 E Morehead Webb, TX, 78045] 2nd RDD contain correct address as follows - [0,541 Jefferson Avenue, Suite 204, Redwood City, CA 94063] [1,6649 N Blue