Building a tag cloud with solr

前端 未结 3 1783
滥情空心
滥情空心 2021-02-06 10:47

Dear stackoverflow community :

Given some text, I wish to get the TOP 50 most frequent words in the text, and create a tag cloud out of it, and thus show the gist of wha

3条回答
  •  情歌与酒
    2021-02-06 10:56

    I have come up with a STOPGAP solution : (Im calling a each solr document a "post" for examples sake)

    There is a terms component in Solr, whose purpose seems to be to expose all the indexed terms of any given field. It is mainly used to implement features like auto-complete, and other features that operate at a term level. And it is by default sorted by frequency - the more frequently occurring terms in the field come up first.

    What I have done is created a dynamic field called content_ and indexed each post-set in its own field based on category. This means that there will be hundreds of instances of the dynamic field each containing one post-set, and I can use the terms component on that field to get TOP TERMS for that post-set.

    As a picture :

    content_postSetOne : contains indexed version of a set of posts
    content_postSetTwo : contains indexed version of another set of posts
    content_postSetThree : contains indexed version of a third set of posts
    

    This solution is sort of working for me, and you can easily create a field per Post also if needed. Im also interested in knowing the implications of using dynamic fields like this : Will this be a problem?

    How this is different from the Paige and jPountz answer is :

    1. The term frequency is the count of words in "A" or "A Set of Docs" and not the count of number of docs containing the term.
    2. I can get the top occurring terms from within ONE document, and if needed also from A Set of documents.
    3. I did not use faceting because it primarily gives the frequency in terms of number of docs and not in terms of number of times the word occurred irrespective of which doc.

提交回复
热议问题