when I want to get the page using urllib2, I don\'t get the full page.
here is the code in python:
import urllib2
import urllib
import socket
from bs
You might have to call read multiple times, as long as it does not return an empty string indicating EOF:
def get_page(url):
""" loads a webpage into a string """
src = ''
req = urllib2.Request(url)
try:
response = urllib2.urlopen(req)
chunk = True
while chunk:
chunk = response.read(1024)
src += chunk
response.close()
except IOError:
print 'can\'t open',url
return src
return src
I had the same problem, I though it was urllib but it was bs4.
Instead of use
BeautifulSoup(src)
or
soup = bs4.BeautifulSoup(html, 'html.parser')
try use
soup = bs4.BeautifulSoup(html, 'html5lib')