urllib | 易学教程

Download file from Blob URL with Python

阅读更多关于 Download file from Blob URL with Python

I wish to have my Python script download the Master data (Download, XLSX) Excel file from this Frankfurt stock exchange webpage . When to retrieve it with urrlib and wget , it turns out that the URL leads to a Blob and the file downloaded is only 289 bytes and unreadable. http://www.xetra.com/blob/1193366/b2f210876702b8e08e40b8ecb769a02e/data/All-tradable-ETFs-ETCs-and-ETNs.xlsx I'm entirely unfamiliar with Blobs and have these questions: Can the file "behind the Blob" be successfully retrieved using Python? If so, is it necessary to uncover the "true" URL behind the Blob – if there is such a

Again urllib.error.HTTPError: HTTP Error 400: Bad Request

阅读更多关于 Again urllib.error.HTTPError: HTTP Error 400: Bad Request

问题 Hy! I tried to open web-page, that is normally opening in browser, but python just swears and does not want to work. import urllib.request, urllib.error f = urllib.request.urlopen('http://www.booking.com/reviewlist.html?cc1=tr;pagename=sapphire') And another way import urllib.request, urllib.error opener=urllib.request.build_opener() f=opener.open('http://www.booking.com/reviewlist.html?cc1=tr;pagename=sapphi re') Both options give one type of error: Traceback (most recent call last): File "

What is the difference between <class 'str'> and <type 'str'>

阅读更多关于 What is the difference between and

问题 I am new to python. I'm confused by the <class 'str'> . I got a str by using: response = urllib.request.urlopen(req).read().decode() The type of 'response' is <class 'str'> , not <type 'str'> . When I try to manipulate this str in 'for loop': for ID in response: The 'response' is read NOT by line, BUT by character. I intend to put every line of 'response' into individual element of a list. Now I have to write the response in a file and use 'open' to get a string of <type 'str'> that I can use

Javascript unescape() vs. Python urllib.unquote()

阅读更多关于 Javascript unescape() vs. Python urllib.unquote()

From reading various posts, it seems like JavaScript's unescape() is equivalent to Pythons urllib.unquote() , however when I test both I get different results: In browser console: unescape('%u003c%u0062%u0072%u003e'); output: <br> In Python interpreter: import urllib urllib.unquote('%u003c%u0062%u0072%u003e') output: %u003c%u0062%u0072%u003e I would expect Python to also return <br> . Any ideas as to what I'm missing here? Thanks! %uxxxx is a non standard URL encoding scheme that is not supported by urllib.parse.unquote() (Py 3) / urllib.unquote() (Py 2). It was only ever part of ECMAScript

Trying to post multipart form data in python, won't post

阅读更多关于 Trying to post multipart form data in python, won't post

I'm fairly new to python, so I apologize in advance if this is something simple I'm missing. I'm trying to post data to a multipart form in python. The script runs, but it won't post. I'm not sure what I'm doing wrong. import urllib, urllib2 from poster.encode import multipart_encode from poster.streaminghttp import register_openers def toqueXF(): register_openers() url = "http://localhost/trunk/admin/new.php" values = {'form':open('/test.pdf'), 'bandingxml':open('/banding.xml'), 'desc':'description'} data, headers = multipart_encode(values) request = urllib2.Request(url, data, headers)

temporarily retrieve an image using the requests library

阅读更多关于 temporarily retrieve an image using the requests library

I'm writing a web scraper than needs to scrape only the thumbnail of an image from the url. This is my function using, the urlib library. def create_thumb(self): if self.url and not self.thumbnail: image = urllib.request.urlretrieve(self.url) # Create the thumbnail of dimension size size = 350, 350 t_img = Imagelib.open(image[0]) t_img.thumbnail(size) # Get the directory name where the temp image was stored # by urlretrieve dir_name = os.path.dirname(image[0]) # Get the image name from the url img_name = os.path.basename(self.url) # Save the thumbnail in the same temp directory # where

urllib downloading contents of an online directory

阅读更多关于 urllib downloading contents of an online directory

I'm trying to make a program that will open a directory, then use regular expressions to get the names of powerpoints and then create files locally and copy their content. When I run this it appears to work, however when I actually try to open the files they keep saying the version is wrong. from urllib.request import urlopen import re urlpath = urlopen('http://www.divms.uiowa.edu/~jni/courses/ProgrammignInCobol/presentation/') string = urlpath.read().decode('utf-8') pattern = re.compile('ch[0-9]*.ppt') #the pattern actually creates duplicates in the list filelist = pattern.findall(string)

How to download a webpage that require username and password?

阅读更多关于 How to download a webpage that require username and password?

问题 For example, I want to download this page after inserting username and password: http://forum.ubuntu-it.org/ I have tryed with wget but doesn't work. Is there a solution with python ? You can test with these username and password: username: johnconnor password: hellohello 回答1: Like @robert says, use mechanize. To get you started: from mechanize import Browser b = Browser() b.open("http://forum.ubuntu-it.org/index.php") b.select_form(nr=0) b["user"] = "johnconnor" b["passwrd"] = "hellohello" b

urlopen Returning Redirect Error for Valid Links

阅读更多关于 urlopen Returning Redirect Error for Valid Links

问题 I'm building a broken link checker in python, and it's becoming a chore building the logic for correctly identifying links that do not resolve when visited with a browser. I've found a set of links where I can consistently reproduce a redirect error with my scraper, but which resolve perfectly when visited in a browser. I was hoping I could find some insight here. import urllib import urllib.request import html.parser import requests from requests.exceptions import HTTPError from socket

python中的urlencode与urldecode

阅读更多关于 python中的urlencode与urldecode

当url地址含有中文，或者参数有中文的时候，这个算是很正常了，但是把这样的url作为参数传递的时候（最常见的callback），需要把一些中文甚至'/'做一下编码转换。所以对于一些中文或者字符，url不识别的，则需要进行转换，转换结果如下：一、urlencode urllib库里面有个urlencode函数，可以把key-value这样的键值对转换成我们想要的格式，返回的是a=1&b=2这样的字符串，比如： import urllib.parse values={} values['username']='02蔡彩虹' values['password']='ddddd?' url="http://www.baidu.com" data=urllib.parse.urlencode(values) print(data) 结果如下：如果只想对一个字符串进行urlencode转换，怎么办？urllib提供另外一个函数：quote() import urllib.parse s='长春' s=urllib.parse.quote(s) print(s) 输出结果为：二、urldecode 当urlencode之后的字符串传递过来之后，接受完毕就要解码了——urldecode。urllib提供了unquote()这个函数，可没有urldecode()！ s='%E5%B9%BF