urllib

soup.findAll returning empty list

梦想的初衷 提交于 2021-01-29 19:10:15
问题 I am trying to scrape with soup and am obtaining an empty set when I call findAll from urllib.request import urlopen as uReq from bs4 import BeautifulSoup as soup my_url='https://www.sainsburys.co.uk/webapp/wcs/stores/servlet/SearchDisplayView?catalogId=10123&langId=44&storeId=10151&krypto=70KutR16JmLgr7Ka%2F385RFXrzDpOkSqx%2FRC3DnlU09%2BYcw0pR5cfIfC0kOlQywiD%2BTEe7ppq8ENXglbpqA8sDUtif1h3ZjrEoQkV29%2B90iqljHi2gm2T%2BDZHH2%2FCNeKB%2BkVglbz%2BNx1bKsSfE5L6SVtckHxg%2FM%2F

How to send POST request with no data

北慕城南 提交于 2021-01-29 18:01:21
问题 Is it possible using urllib or urllib2 to send no data with a POST request? Sounds odd, but the API I am using sends blank data in the POST request. I've tried the following, but it seems to be issuing a GET request because of no POST data. url = 'https://site.com/registerclaim?cid=' + int(cid) values = {} headers = { 'User-Agent' : 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.120 Safari/537.36', 'X-CRFS-Token' : csrfToken, 'X

How to cancel or pause a urllib request in python

南楼画角 提交于 2021-01-29 12:58:57
问题 So I have this program which requests a file from the web and the user can download it. I am using urllib.request and tkinter for my program. The problem is that when the user hits the 'Download' button there is no pause or cancel until the file gets downloaded and the program freezes too. I really want to create a pause or a cancel button, but I do not know how and I want to eliminate the freezing of the program. Should I use another library like 'requests'? Or should I try threading? Can

Creating URLs in a loop

﹥>﹥吖頭↗ 提交于 2021-01-29 05:22:58
问题 I am trying to create a list of URLs using a for loop. It prints all the correct URLs, but is not saving them in a list. Ultimately I want to download multiple files using urlretrieve. for i, j in zip(range(0, 17), range(1, 18)): if i < 8 or j < 10: url = "https://Here is a URL/P200{}".format(i) + "-0{}".format(j) + ".xls" print(url) if i == 9 and j == 10: url = "https://Here is a URL/P200{}".format(i) + "-{}".format(j) + ".xls" print(url) if i > 9: if i > 9 or j < 8: url = "https://Here is a

URL component % and \x

孤者浪人 提交于 2021-01-28 11:52:08
问题 I have a doubt. st = "b%C3%BCrokommunikation" urllib2.unquote(st) OUTPUT: 'b\xc3\xbcrokommunikation' But, if I print it: print urllib2.unquote(st) OUTPUT: bürokommunikation Why is the difference? I have to write bürokommunikation instead of 'b\xc3\xbcrokommunikation' into a file. My problem is: I have lots of data with such values extracted from URLs. I have to store them as eg. bürokommunikation into a text file. 回答1: When you print the string, your terminal emulator recognizes the unicode

tarfile can't open tgz

十年热恋 提交于 2021-01-28 09:28:57
问题 I am trying to download tgz file from this website: https://plg.uwaterloo.ca/cgi-bin/cgiwrap/gvcormac/foo07 here is my script: import os from six.moves import urllib import tarfile spam_path=os.path.join('ML', 'spam') root_download='https://plg.uwaterloo.ca/cgi-bin/cgiwrap/gvcormac/foo07' spam_url=root_download+'255 MB Corpus (trec07p.tgz)' if not os.path.isdir(spam_path): os.makedirs(spam_path) path=os.path.join(spam_path, 'trec07p.tgz') if not os.path.isfile('trec07p.tgz'): urllib.request

I get InvalidURL: URL can't contain control characters when I try to send a request using urllib

隐身守侯 提交于 2021-01-28 07:48:06
问题 I am trying to get a JSON response from the link used as a parameter to the urllib request. but it gives me an error that it can't contain control characters. how can I solve the issue? start_url = "https://devbusiness.un.org/solr-sitesearch-output/10//0/ds_field_last_updated/desc?bundle_fq =procurement_notice&sm_vid_Institutions_fq=&sm_vid_Procurement_Type_fq=&sm_vid_Countries_fq=&sm_vid_Sectors_fq= &sm_vid_Languages_fq=English&sm_vid_Notice_Type_fq=&deadline_multifield_fq=&ts_field_project

Python urllib.request.urlopen: AttributeError: 'bytes' object has no attribute 'data'

大憨熊 提交于 2021-01-27 22:13:58
问题 I am using Python 3 and trying to connect to dstk . I am getting an error with urllib package. I researched a lot on SO and could not find anything similar to this problem. api_url = self.api_base+'/street2coordinates' api_body = json.dumps(addresses) #api_url=api_url.encode("utf-8") #api_body=api_body.encode("utf-8") print(type(api_url)) response_string = six.moves.urllib.request.urlopen(api_url, api_body).read() response = json.loads(response_string) If I do not encode the api_url and api

Beautiful Soup - urllib.error.HTTPError: HTTP Error 403: Forbidden

情到浓时终转凉″ 提交于 2021-01-27 21:39:22
问题 I am trying to download a GIF file with urrlib , but it is throwing this error: urllib.error.HTTPError: HTTP Error 403: Forbidden This does not happen when I download from other blog sites. This is my code: import requests import urllib.request url_1 = 'https://goodlogo.com/images/logos/small/nike_classic_logo_2355.gif' source_code = requests.get(url_1,headers = {'User-Agent': 'Mozilla/5.0'}) path = 'C:/Users/roysu/Desktop/src_code/Python_projects/python/web_scrap/myPath/' full_name = path +

Python http.client.RemoteDisconnected

僤鯓⒐⒋嵵緔 提交于 2021-01-27 13:22:22
问题 Trying to run several ids through a webservice using python and I am getting the 'http.client.RemoteDisconnected: Remote end closed connection without response' error. I don't want to try/catch this error, I want to investigate why I am getting this response. I have been able to get 400 and 500 level errors raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 413: using the method I am including below, and I understood those errors (so I don't believe the