Python - Getting all links from a div having a class

前端 未结 4 1328
萌比男神i
萌比男神i 2021-02-03 11:09

I am using BeautifulSoup to get all links of mobile phones from this url http://www.gsmarena.com/samsung-phones-f-9-0-p2.php

My code for the following is :



        
相关标签:
4条回答
  • 2021-02-03 11:37

    Because you're only outputting one link per div, whereas it's clear from that site that there are multiple links, each inside its own li, and multiple lis per ul. You'll need to loop through all the lis.

    0 讨论(0)
  • 2021-02-03 11:37

    Taken from http://www.crummy.com/software/BeautifulSoup/download/2.x/documentation.html:

    For instance, if you wanted to get only "a" Tags that had non-empty "href" attributes, you would call soup.fetch('a', {'href':re.compile('.+')}). If you wanted to get all tags that had an "width" attribute of 100, you would call soup.fetch(attrs={'width':100}).

    Try this: data = soup.findAll('div',attrs={'class':re.compile('.+')});

    Should fetch all the divs with a class property present and not empty.

    0 讨论(0)
  • 2021-02-03 11:46

    There are only three <div> elements in that page with a class of 'makers', this will print the first link from each div, so three in all.

    This is likely closer to what you desire:

    import urllib2
    from BeautifulSoup import BeautifulSoup
    
    url = "http://www.gsmarena.com/samsung-phones-f-9-0-p2.php"
    text = urllib2.urlopen(url).read()
    soup = BeautifulSoup(text)
    
    data = soup.findAll('div',attrs={'class':'makers'})
    for div in data:
        links = div.findAll('a')
        for a in links:
            print "http://www.gsmarena.com/" + a['href']
    
    0 讨论(0)
  • 2021-02-03 11:48

    If you have Python 3, you can use Simon's answer with the following change:

    from urllib.request import urlopen
    from bs4 import BeautifulSoup
    
    text = urlopen(base_url).read()
    
    0 讨论(0)
提交回复
热议问题