Instapaper-like algorithm

后端 未结 4 418
北恋
北恋 2021-01-29 17:40

Does anyone of an algorithm that extracts contents from a webpage? like instapaper?

4条回答
  •  盖世英雄少女心
    2021-01-29 18:15

    If you just want all the content and none of the formatting in Python

    >>> from BeautifulSoup import BeautifulSoup
    >>> from urllib import urlopen
    >>> soup = BeautifulSoup(urlopen("http://www.python.org/").read())
    >>> contents = ''.join(soup.findAll(text=True))
    

    does the trick

提交回复
热议问题