Keyword analysis in PHP

后端 未结 5 1121
走了就别回头了
走了就别回头了 2021-01-30 03:47

For a web application I\'m building I need to analyze a website, retrieve and rank it\'s most important keywords and display those.

Getting all words, their density and

5条回答
  •  滥情空心
    2021-01-30 04:13

    One thing which is missing in your algorithm is document-oriented analysis (if you didn't omit it intentionally for some reason).

    Every site is built on a document set. Counting word frequencies for all and every document will provide you with information about words coverage. The words which occur in most of documents are stop words. The words specific for a limited number of documents can form a cluster of documents on a specific topic. Number of documents pertaining to a specific topic can increase overall importance of the words of the topic, or at least provide an additional factor to be counted in your formulae.

    Perhaps, you could benefit from a preconfigured classificator which contains categories/topics and keywords for each of them (this task can be partially automated by indexing existing public hierarchies of categories, up to Wikipedia, but this is not a trivial task itself). Then you can involve categories into analisys.

    Also, you can improve statistics by analysis on sentence-level. That is, having frequencies of how often words occur in the same sentence or phrase, you can discover cliches and duplicates and eliminate them from statistics. But, i'm afraid this is not easily impemented in pure PHP.

提交回复
热议问题