urllib

Download file from Blob URL with Python

陌路散爱 提交于 2019-12-01 01:28:22
I wish to have my Python script download the Master data (Download, XLSX) Excel file from this Frankfurt stock exchange webpage . When to retrieve it with urrlib and wget , it turns out that the URL leads to a Blob and the file downloaded is only 289 bytes and unreadable. http://www.xetra.com/blob/1193366/b2f210876702b8e08e40b8ecb769a02e/data/All-tradable-ETFs-ETCs-and-ETNs.xlsx I'm entirely unfamiliar with Blobs and have these questions: Can the file "behind the Blob" be successfully retrieved using Python? If so, is it necessary to uncover the "true" URL behind the Blob – if there is such a

Again urllib.error.HTTPError: HTTP Error 400: Bad Request

雨燕双飞 提交于 2019-12-01 01:06:21
问题 Hy! I tried to open web-page, that is normally opening in browser, but python just swears and does not want to work. import urllib.request, urllib.error f = urllib.request.urlopen('http://www.booking.com/reviewlist.html?cc1=tr;pagename=sapphire') And another way import urllib.request, urllib.error opener=urllib.request.build_opener() f=opener.open('http://www.booking.com/reviewlist.html?cc1=tr;pagename=sapphi re') Both options give one type of error: Traceback (most recent call last): File "

What is the difference between <class 'str'> and <type 'str'>

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-01 00:48:06
问题 I am new to python. I'm confused by the <class 'str'> . I got a str by using: response = urllib.request.urlopen(req).read().decode() The type of 'response' is <class 'str'> , not <type 'str'> . When I try to manipulate this str in 'for loop': for ID in response: The 'response' is read NOT by line, BUT by character. I intend to put every line of 'response' into individual element of a list. Now I have to write the response in a file and use 'open' to get a string of <type 'str'> that I can use

Javascript unescape() vs. Python urllib.unquote()

假如想象 提交于 2019-11-30 22:28:07
From reading various posts, it seems like JavaScript's unescape() is equivalent to Pythons urllib.unquote() , however when I test both I get different results: In browser console: unescape('%u003c%u0062%u0072%u003e'); output: <br> In Python interpreter: import urllib urllib.unquote('%u003c%u0062%u0072%u003e') output: %u003c%u0062%u0072%u003e I would expect Python to also return <br> . Any ideas as to what I'm missing here? Thanks! %uxxxx is a non standard URL encoding scheme that is not supported by urllib.parse.unquote() (Py 3) / urllib.unquote() (Py 2). It was only ever part of ECMAScript

Trying to post multipart form data in python, won't post

雨燕双飞 提交于 2019-11-30 22:05:28
I'm fairly new to python, so I apologize in advance if this is something simple I'm missing. I'm trying to post data to a multipart form in python. The script runs, but it won't post. I'm not sure what I'm doing wrong. import urllib, urllib2 from poster.encode import multipart_encode from poster.streaminghttp import register_openers def toqueXF(): register_openers() url = "http://localhost/trunk/admin/new.php" values = {'form':open('/test.pdf'), 'bandingxml':open('/banding.xml'), 'desc':'description'} data, headers = multipart_encode(values) request = urllib2.Request(url, data, headers)

temporarily retrieve an image using the requests library

好久不见. 提交于 2019-11-30 22:02:56
I'm writing a web scraper than needs to scrape only the thumbnail of an image from the url. This is my function using, the urlib library. def create_thumb(self): if self.url and not self.thumbnail: image = urllib.request.urlretrieve(self.url) # Create the thumbnail of dimension size size = 350, 350 t_img = Imagelib.open(image[0]) t_img.thumbnail(size) # Get the directory name where the temp image was stored # by urlretrieve dir_name = os.path.dirname(image[0]) # Get the image name from the url img_name = os.path.basename(self.url) # Save the thumbnail in the same temp directory # where

urllib downloading contents of an online directory

天大地大妈咪最大 提交于 2019-11-30 20:26:07
I'm trying to make a program that will open a directory, then use regular expressions to get the names of powerpoints and then create files locally and copy their content. When I run this it appears to work, however when I actually try to open the files they keep saying the version is wrong. from urllib.request import urlopen import re urlpath = urlopen('http://www.divms.uiowa.edu/~jni/courses/ProgrammignInCobol/presentation/') string = urlpath.read().decode('utf-8') pattern = re.compile('ch[0-9]*.ppt') #the pattern actually creates duplicates in the list filelist = pattern.findall(string)

How to download a webpage that require username and password?

我与影子孤独终老i 提交于 2019-11-30 19:47:16
问题 For example, I want to download this page after inserting username and password: http://forum.ubuntu-it.org/ I have tryed with wget but doesn't work. Is there a solution with python ? You can test with these username and password: username: johnconnor password: hellohello 回答1: Like @robert says, use mechanize. To get you started: from mechanize import Browser b = Browser() b.open("http://forum.ubuntu-it.org/index.php") b.select_form(nr=0) b["user"] = "johnconnor" b["passwrd"] = "hellohello" b

urlopen Returning Redirect Error for Valid Links

不问归期 提交于 2019-11-30 19:06:52
问题 I'm building a broken link checker in python, and it's becoming a chore building the logic for correctly identifying links that do not resolve when visited with a browser. I've found a set of links where I can consistently reproduce a redirect error with my scraper, but which resolve perfectly when visited in a browser. I was hoping I could find some insight here. import urllib import urllib.request import html.parser import requests from requests.exceptions import HTTPError from socket

python中的urlencode与urldecode

被刻印的时光 ゝ 提交于 2019-11-30 18:46:15
当url地址含有中文,或者参数有中文的时候,这个算是很正常了,但是把这样的url作为参数传递的时候(最常见的callback),需要把一些中文甚至'/'做一下编码转换。 所以对于一些中文或者字符,url不识别的,则需要进行转换,转换结果如下: 一、urlencode urllib库里面有个urlencode函数,可以把key-value这样的键值对转换成我们想要的格式,返回的是a=1&b=2这样的字符串,比如: import urllib.parse values={} values['username']='02蔡彩虹' values['password']='ddddd?' url="http://www.baidu.com" data=urllib.parse.urlencode(values) print(data) 结果如下: 如果只想对一个字符串进行urlencode转换,怎么办?urllib提供另外一个函数:quote() import urllib.parse s='长春' s=urllib.parse.quote(s) print(s) 输出结果为: 二、urldecode 当urlencode之后的字符串传递过来之后,接受完毕就要解码了——urldecode。urllib提供了unquote()这个函数,可没有urldecode()! s='%E5%B9%BF