Dear stackoverflow community :
Given some text, I wish to get the TOP 50 most frequent words in the text, and create a tag cloud out of it, and thus show the gist of wha
I have come up with a STOPGAP solution : (Im calling a each solr document a "post" for examples sake)
There is a terms component in Solr, whose purpose seems to be to expose all the indexed terms of any given field. It is mainly used to implement features like auto-complete, and other features that operate at a term level. And it is by default sorted by frequency - the more frequently occurring terms in the field come up first.
What I have done is created a dynamic field called content_
and indexed each post-set in its own field based on category. This means that there will be hundreds of instances of the dynamic field each containing one post-set, and I can use the terms component on that field to get TOP TERMS for that post-set.
As a picture :
content_postSetOne : contains indexed version of a set of posts
content_postSetTwo : contains indexed version of another set of posts
content_postSetThree : contains indexed version of a third set of posts
This solution is sort of working for me, and you can easily create a field per Post also if needed. Im also interested in knowing the implications of using dynamic fields like this : Will this be a problem?
How this is different from the Paige and jPountz answer is :