Python - Getting all links from a div having a class

前端未结

关注

 4  1333

I am using BeautifulSoup to get all links of mobile phones from this url http://www.gsmarena.com/samsung-phones-f-9-0-p2.php

My code for the following is :

相关标签:

4条回答

轻奢々

2021-02-03 11:37

Because you're only outputting one link per div, whereas it's clear from that site that there are multiple links, each inside its own li, and multiple lis per ul. You'll need to loop through all the lis.

0 讨论(0)
发布评论:

提交评论
- 加载中...
暖寄归人

2021-02-03 11:37

Taken from http://www.crummy.com/software/BeautifulSoup/download/2.x/documentation.html:

For instance, if you wanted to get only "a" Tags that had non-empty "href" attributes, you would call soup.fetch('a', {'href':re.compile('.+')}). If you wanted to get all tags that had an "width" attribute of 100, you would call soup.fetch(attrs={'width':100}).

Try this: data = soup.findAll('div',attrs={'class':re.compile('.+')});

Should fetch all the divs with a class property present and not empty.

0 讨论(0)
发布评论:

提交评论
- 加载中...

迷失自我

2021-02-03 11:46

There are only three <div> elements in that page with a class of 'makers', this will print the first link from each div, so three in all.

This is likely closer to what you desire:

import urllib2
from BeautifulSoup import BeautifulSoup

url = "http://www.gsmarena.com/samsung-phones-f-9-0-p2.php"
text = urllib2.urlopen(url).read()
soup = BeautifulSoup(text)

data = soup.findAll('div',attrs={'class':'makers'})
for div in data:
    links = div.findAll('a')
    for a in links:
        print "http://www.gsmarena.com/" + a['href']

0 讨论(0)

自闭症患者

2021-02-03 11:48
If you have Python 3, you can use Simon's answer with the following change:
```
from urllib.request import urlopen
from bs4 import BeautifulSoup

text = urlopen(base_url).read()
```
0 讨论(0)
发布评论:

提交评论
- 加载中...