Apache spark has a TF-IDF algorithm available: https://spark.apache.org/docs/latest/ml-features.html#tf-idf
When you run the example, it adds the "rawFeatures and &q