发表新帖

发表新帖

How to find out the summarized text of a given URL in python / Django? [closed]

前端未结

关注

 4  861

情话喂你 2021-01-31 05:45

4条回答

别那么骄傲 (楼主)

2021-01-31 06:19

Check out the Natural Language Toolkit. Its a very useful python library if you're doing any text-processing.

Then look at this paper by HP Luhn (1958). It describes a naive but effective method of generating summaries of text.

Use the nltk.probability.FreqDist object to track how often words appear in text and then score sentences according to how many of the most frequent words appear in them. Then select the sentences with the best scores and voila, you have a summary of the document.

I suspect the NLTK should have a means of loading documents from the web and getting all of the HTML tags out of the way. I haven't done that kind of thing myself, but if you look up the corpus readers you might find something helpful.

0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...

热议问题