urllib | 易学教程

soup.findAll returning empty list

阅读更多关于 soup.findAll returning empty list

问题 I am trying to scrape with soup and am obtaining an empty set when I call findAll from urllib.request import urlopen as uReq from bs4 import BeautifulSoup as soup my_url='https://www.sainsburys.co.uk/webapp/wcs/stores/servlet/SearchDisplayView?catalogId=10123&langId=44&storeId=10151&krypto=70KutR16JmLgr7Ka%2F385RFXrzDpOkSqx%2FRC3DnlU09%2BYcw0pR5cfIfC0kOlQywiD%2BTEe7ppq8ENXglbpqA8sDUtif1h3ZjrEoQkV29%2B90iqljHi2gm2T%2BDZHH2%2FCNeKB%2BkVglbz%2BNx1bKsSfE5L6SVtckHxg%2FM%2F

How to send POST request with no data

阅读更多关于 How to send POST request with no data

问题 Is it possible using urllib or urllib2 to send no data with a POST request? Sounds odd, but the API I am using sends blank data in the POST request. I've tried the following, but it seems to be issuing a GET request because of no POST data. url = 'https://site.com/registerclaim?cid=' + int(cid) values = {} headers = { 'User-Agent' : 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.120 Safari/537.36', 'X-CRFS-Token' : csrfToken, 'X

How to cancel or pause a urllib request in python

阅读更多关于 How to cancel or pause a urllib request in python

问题 So I have this program which requests a file from the web and the user can download it. I am using urllib.request and tkinter for my program. The problem is that when the user hits the 'Download' button there is no pause or cancel until the file gets downloaded and the program freezes too. I really want to create a pause or a cancel button, but I do not know how and I want to eliminate the freezing of the program. Should I use another library like 'requests'? Or should I try threading? Can

Creating URLs in a loop

阅读更多关于 Creating URLs in a loop

问题 I am trying to create a list of URLs using a for loop. It prints all the correct URLs, but is not saving them in a list. Ultimately I want to download multiple files using urlretrieve. for i, j in zip(range(0, 17), range(1, 18)): if i < 8 or j < 10: url = "https://Here is a URL/P200{}".format(i) + "-0{}".format(j) + ".xls" print(url) if i == 9 and j == 10: url = "https://Here is a URL/P200{}".format(i) + "-{}".format(j) + ".xls" print(url) if i > 9: if i > 9 or j < 8: url = "https://Here is a

URL component % and \x

阅读更多关于 URL component % and \x

问题 I have a doubt. st = "b%C3%BCrokommunikation" urllib2.unquote(st) OUTPUT: 'b\xc3\xbcrokommunikation' But, if I print it: print urllib2.unquote(st) OUTPUT: bürokommunikation Why is the difference? I have to write bürokommunikation instead of 'b\xc3\xbcrokommunikation' into a file. My problem is: I have lots of data with such values extracted from URLs. I have to store them as eg. bürokommunikation into a text file. 回答1: When you print the string, your terminal emulator recognizes the unicode

tarfile can't open tgz

阅读更多关于 tarfile can't open tgz

问题 I am trying to download tgz file from this website: https://plg.uwaterloo.ca/cgi-bin/cgiwrap/gvcormac/foo07 here is my script: import os from six.moves import urllib import tarfile spam_path=os.path.join('ML', 'spam') root_download='https://plg.uwaterloo.ca/cgi-bin/cgiwrap/gvcormac/foo07' spam_url=root_download+'255 MB Corpus (trec07p.tgz)' if not os.path.isdir(spam_path): os.makedirs(spam_path) path=os.path.join(spam_path, 'trec07p.tgz') if not os.path.isfile('trec07p.tgz'): urllib.request

I get InvalidURL: URL can't contain control characters when I try to send a request using urllib

阅读更多关于 I get InvalidURL: URL can't contain control characters when I try to send a request using urllib

问题 I am trying to get a JSON response from the link used as a parameter to the urllib request. but it gives me an error that it can't contain control characters. how can I solve the issue? start_url = "https://devbusiness.un.org/solr-sitesearch-output/10//0/ds_field_last_updated/desc?bundle_fq =procurement_notice&sm_vid_Institutions_fq=&sm_vid_Procurement_Type_fq=&sm_vid_Countries_fq=&sm_vid_Sectors_fq= &sm_vid_Languages_fq=English&sm_vid_Notice_Type_fq=&deadline_multifield_fq=&ts_field_project

Python urllib.request.urlopen: AttributeError: 'bytes' object has no attribute 'data'

阅读更多关于 Python urllib.request.urlopen: AttributeError: 'bytes' object has no attribute 'data'

问题 I am using Python 3 and trying to connect to dstk . I am getting an error with urllib package. I researched a lot on SO and could not find anything similar to this problem. api_url = self.api_base+'/street2coordinates' api_body = json.dumps(addresses) #api_url=api_url.encode("utf-8") #api_body=api_body.encode("utf-8") print(type(api_url)) response_string = six.moves.urllib.request.urlopen(api_url, api_body).read() response = json.loads(response_string) If I do not encode the api_url and api

Beautiful Soup - urllib.error.HTTPError: HTTP Error 403: Forbidden

阅读更多关于 Beautiful Soup - urllib.error.HTTPError: HTTP Error 403: Forbidden

问题 I am trying to download a GIF file with urrlib , but it is throwing this error: urllib.error.HTTPError: HTTP Error 403: Forbidden This does not happen when I download from other blog sites. This is my code: import requests import urllib.request url_1 = 'https://goodlogo.com/images/logos/small/nike_classic_logo_2355.gif' source_code = requests.get(url_1,headers = {'User-Agent': 'Mozilla/5.0'}) path = 'C:/Users/roysu/Desktop/src_code/Python_projects/python/web_scrap/myPath/' full_name = path +

Python http.client.RemoteDisconnected

阅读更多关于 Python http.client.RemoteDisconnected

问题 Trying to run several ids through a webservice using python and I am getting the 'http.client.RemoteDisconnected: Remote end closed connection without response' error. I don't want to try/catch this error, I want to investigate why I am getting this response. I have been able to get 400 and 500 level errors raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 413: using the method I am including below, and I understood those errors (so I don't believe the