I am using BeautifulSoup to get all links of mobile phones from this url http://www.gsmarena.com/samsung-phones-f-9-0-p2.php
My code for the following is :
Because you're only outputting one link per div, whereas it's clear from that site that there are multiple links, each inside its own li, and multiple lis per ul. You'll need to loop through all the lis.
Taken from http://www.crummy.com/software/BeautifulSoup/download/2.x/documentation.html:
For instance, if you wanted to get only "a" Tags that had non-empty "href" attributes, you would call
soup.fetch('a', {'href':re.compile('.+')})
. If you wanted to get all tags that had an "width" attribute of 100, you would callsoup.fetch(attrs={'width':100})
.
Try this: data = soup.findAll('div',attrs={'class':re.compile('.+')});
Should fetch all the divs with a class property present and not empty.
There are only three <div>
elements in that page with a class of 'makers', this will print the first link from each div, so three in all.
This is likely closer to what you desire:
import urllib2
from BeautifulSoup import BeautifulSoup
url = "http://www.gsmarena.com/samsung-phones-f-9-0-p2.php"
text = urllib2.urlopen(url).read()
soup = BeautifulSoup(text)
data = soup.findAll('div',attrs={'class':'makers'})
for div in data:
links = div.findAll('a')
for a in links:
print "http://www.gsmarena.com/" + a['href']
If you have Python 3, you can use Simon's answer with the following change:
from urllib.request import urlopen
from bs4 import BeautifulSoup
text = urlopen(base_url).read()