问题
I am new to Mahout. I have a requirement to convert a text file to a vector for classification in later stage.
Could anybody of of shed some light on these below questions?
- How to convert a text file to a vector in mahout? The file format is like "username|comment about item|rating"
- The data will be few TBs. So which algorithm implementable I can use for classification using the vector I suppose to create?
Thanks, Arun
回答1:
You can check these 2 examples that also somewhat do/explain how to use the Sequence File API. Here and here
And you should definitely read this intro to text analysis
来源:https://stackoverflow.com/questions/11932668/vectorization-in-apache-mahout