I wanted to write a piece of code like the following:
from bs4 import BeautifulSoup
import urllib2
url = \'http://www.thefamouspeople.com/singers.php\'
html = u
In urlip3 there's no .urlopen
, instead try this:
import requests
html = requests.get(url)
The new urllib3 library has a nice documentation here
In order to get your desired result you shuld follow that:
Import urllib3
from bs4 import BeautifulSoup
url = 'http://www.thefamouspeople.com/singers.php'
http = urllib3.PoolManager()
response = http.request('GET', url)
soup = BeautifulSoup(response.data.decode('utf-8'))
The "decode utf-8" part is optional. It worked without it when i tried, but i posted the option anyway.
Source: User Guide
With gazpacho you could pipeline the page straight into a parse-able soup object:
from gazpacho import Soup
url = "http://www.thefamouspeople.com/singers.php"
soup = Soup.get(url)
And run finds on top of it:
soup.find("div")
urllib3 is a different library from urllib and urllib2. It has lots of additional features to the urllibs in the standard library, if you need them, things like re-using connections. The documentation is here: https://urllib3.readthedocs.org/
If you'd like to use urllib3, you'll need to pip install urllib3
. A basic example looks like this:
from bs4 import BeautifulSoup
import urllib3
http = urllib3.PoolManager()
url = 'http://www.thefamouspeople.com/singers.php'
response = http.request('GET', url)
soup = BeautifulSoup(response.data)
You do not have to install urllib3
. You can choose any HTTP-request-making library that fits your needs and feed the response to BeautifulSoup
. The choice is though usually requests because of the rich feature set and convenient API. You can install requests
by entering pip install requests
in the command line. Here is a basic example:
from bs4 import BeautifulSoup
import requests
url = "url"
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")