Using Python and BeautifulSoup (saved webpage source codes into a local file)

前端 未结 3 777
无人及你
无人及你 2020-12-03 06:14

I am using Python 2.7 + BeautifulSoup 4.3.2.

I am trying to use Python and BeautifulSoup to pick up information on a webpage. Because the webpage is in the company w

相关标签:
3条回答
  • 2020-12-03 07:07

    The best way to open a local file with BeautifulSoup is to pass it an open file handler directly. http://www.crummy.com/software/BeautifulSoup/bs4/doc/#making-the-soup

    from bs4 import BeautifulSoup
    
    soup = BeautifulSoup(open("C:\\example.html"), "html.parser")
    
    for city in soup.find_all('span', {'class' : 'city-sh'}):
        print(city)
    
    0 讨论(0)
  • 2020-12-03 07:07

    With Chandan's help, the problem has been solved. All the credits shall go to him. :)

    the "urllib2.url" is useless here.

    from bs4 import BeautifulSoup
    import re
    # import urllib2
    
    url = "C:\example.html"
    page = open(url)
    soup = BeautifulSoup(page.read())
    
    cities = soup.find_all('span', {'class' : 'city-sh'})
    
    for city in cities:
        print city
    
    0 讨论(0)
  • 2020-12-03 07:16

    You can try using lxml parser also. Here is an example for your html data.

    from lxml.html import fromstring
    import lxml.html as PARSER
    
    data = open('example.html').read()
    root = PARSER.fromstring(data)
    
    for ele in root.getiterator():
        if ele.tag == "td":
            print ele.text_content()
    

    o/p: port_new_cape 452 South May 09, 1997 Jan 23, 2009 12:05 pm 

    0 讨论(0)
提交回复
热议问题