urllib | 易学教程

how to send cookies inside post request

阅读更多关于 how to send cookies inside post request

问题 trying to send Post request with the cookies on my pc from get request #! /usr/bin/python import re #regex import urllib import urllib2 #get request x = urllib2.urlopen("http://www.example.com) #GET Request cookies=x.headers['set-cookie'] #to get the cookies from get request url = 'http://example' # to know the values type any password to know the cookies values = {"username" : "admin", "passwd" : password, "lang" : "" , "option" : "com_login", "task" : "login", "return" : "aW5kZXgucGhw" }

Using Regex to get multiple data on single line by scraping stocks from yahoo [closed]

阅读更多关于 Using Regex to get multiple data on single line by scraping stocks from yahoo [closed]

问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 4 years ago . import urllib import re stocks_symbols = ['aapl', 'spy', 'goog', 'nflx', 'msft'] for i in range(len(stocks_symbols)): htmlfile = urllib.urlopen("https://finance.yahoo.com/q?s=" + stocks_symbols[i]) htmltext = htmlfile.read(htmlfile) regex = '<span id="yfs_l84_' + stocks_symbols[i] + '">(.+?)</span>' pattern = re

Is there a way to run cpython on a diffident thread without risking a crash?

阅读更多关于 Is there a way to run cpython on a diffident thread without risking a crash?

问题 I have a program that runs lots of urllib requests IN AN INFINITE LOOP, which makes my program really slow, so I tried putting them as threads. Urllib uses cpython deep down in the socket module, so the threads that are being created just add up and do nothing because python's GIL prevents a two cpython commands from being executed in diffident threads at the same time. I am running Windows XP with Python 2.5, so I can't use the multiprocess module. I tried looking at the subproccess module

urllib error of Google App Engine & python.[Errno 11003] getaddrinfo failed

阅读更多关于 urllib error of Google App Engine & python.[Errno 11003] getaddrinfo failed

问题 Thanks for your help in advance! I want to get contents of a website, so I use urllib.urlopen(url) . set url='http://localhost:8080'(tomcat page) If I use Google App Engine Launcher, run the application, browse http://localhost:8082 , it works well. But if I specify the address and port for the application: python `"D:\Program Files\Google\google_appengine\dev_appserver.py" -p 8082 -a 10.96.72.213 D:\pagedemon\videoareademo` there's something wrong: Traceback (most recent call last): File "D:

Urllib Unicode Error, no unicode involved

阅读更多关于 Urllib Unicode Error, no unicode involved

问题 EDIT: I've majorly edited the content of this post since the original to specify my problem: I am writing a program to download webcomics, and I'm getting this weird error when downloading a page of the comic. The code I am running essentially boils down to the following line followed by the error. I do not know what is causing this error, and it is confusing me greatly. >>> urllib.request.urlopen("http://abominable.cc/post/47699281401") Traceback (most recent call last): File "<stdin>", line

Scrapy: Error 10054 after retrying image download

阅读更多关于 Scrapy: Error 10054 after retrying image download

问题 I'm running a Scrapy spider in python to scrape images from a website. One of the images fails to download (even if I try to download it regularly through the site) which is an internal error for the site. This is fine, I don't care about trying to get the image, I just want to skip over the image when it fails and move onto the other images, but I keep getting a 10054 error. > Traceback (most recent call last): File > "c:\python27\lib\site-packages\twisted\internet\defer.py", line 588, > in

Cannot read urllib error message once it is read()

阅读更多关于 Cannot read urllib error message once it is read()

问题 My problem is with error handling of the python urllib error object. I am unable to read the error message while still keeping it intact in the error object, for it to be consumed later. response = urllib.request.urlopen(request) # request that will raise an error response.read() response.read() # is empty now # Also tried seek(0), that does not work either. So this how I intend to use it, but when the Exception bubbles up, the .read() second time is empty. try: response = urllib.request

Why I got messy characters while opening url using urllib2?

阅读更多关于 Why I got messy characters while opening url using urllib2?

问题 Here's my code, you guys can also test it out. I always get messed-up characters instead of page source. Header = {"User-Agent": "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8 GTB7.1 (.NET CLR 3.5.30729)"} Req = urllib2.Request("http://rlslog.net", None, Header) Response = urllib2.urlopen(Req) Html = Response.read() print Html[:1000] Normally Html should be page source, but it ended up to be tons of messed-up characters. Anybody knows why? BTW: I'm

Python 3.2 : urllib, SSL and TOR through socket : error with fileno function

阅读更多关于 Python 3.2 : urllib, SSL and TOR through socket : error with fileno function

问题 I have an error, when trying to connect in https over socksipy with the below code. I followed the example here : using tor as a SOCKS5 proxy with python urllib2 or mechanize Or this one : Python urllib over TOR? Edit : this code is actually working when I am using HTTP, but not with HTTPS I have imported socks from the Socksipy python module. Here is the code : import socks import socket #This function has no DNS resolve #it need to use the real ip adress to connect instead of www.google.fr

POST request via urllib/urllib2?

阅读更多关于 POST request via urllib/urllib2?

问题 Before you say anything, I've looked around SO and the solutions didn't work. I need to make a post request to a login script in Python. The URL looks like http://example.com/index.php?act=login, then it accepts username and password via POST. Could anyone help me with this? I've tried: import urllib, urllib2, cookielib cj = cookielib.CookieJar() opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj)) opener.addheaders.append(('User-agent', 'Mozilla/4.0')) opener.addheaders.append( (