python,not getting full response

前端 未结 2 908
礼貌的吻别
礼貌的吻别 2020-12-21 08:26

when I want to get the page using urllib2, I don\'t get the full page.

here is the code in python:

import urllib2
import urllib
import socket
from bs         


        
相关标签:
2条回答
  • 2020-12-21 08:32

    You might have to call read multiple times, as long as it does not return an empty string indicating EOF:

    def get_page(url):
        """ loads a webpage into a string """
        src = ''
    
        req = urllib2.Request(url)
    
        try:
            response = urllib2.urlopen(req)
            chunk = True
            while chunk:
                chunk = response.read(1024)
                src += chunk
            response.close()
        except IOError:
            print 'can\'t open',url 
            return src
    
        return src
    
    0 讨论(0)
  • 2020-12-21 08:34

    I had the same problem, I though it was urllib but it was bs4.

    Instead of use

    BeautifulSoup(src)
    

    or

    soup = bs4.BeautifulSoup(html, 'html.parser')
    

    try use

    soup = bs4.BeautifulSoup(html, 'html5lib')
    
    0 讨论(0)
提交回复
热议问题