urllib

Python and urllib

谁都会走 提交于 2019-12-29 07:39:11
问题 I'm trying to download a zip file ("tl_2008_01001_edges.zip") from an ftp census site using urllib. What form is the zip file in when I get it and how do I save it? I'm fairly new to Python and don't understand how urllib works. This is my attempt: import urllib, sys zip_file = urllib.urlretrieve("ftp://ftp2.census.gov/geo/tiger/TIGER2008/01_ALABAMA/Autauga_County/", "tl_2008_01001_edges.zip") If I know the list of ftp folders (or counties in this case), can I run through the ftp site list

How to use urllib in python 3?

霸气de小男生 提交于 2019-12-28 04:07:09
问题 Here is my problem with urllib in python 3. I wrote a piece of code which works well in Python 2.7 and is using urllib2. It goes to the page on Internet (which requires authorization) and grabs me the info from that page. The real problem for me is that I can't make my code working in python 3.4 because there is no urllib2, and urllib works differently; even after few hours of googling and reading I got nothing. So if somebody can help me to solve this, I'd really appreciate that help. Here

Python入妖3-----Urllib库的基本使用

风流意气都作罢 提交于 2019-12-28 02:39:22
什么是Urllib Urllib是python内置的HTTP请求库 包括以下模块 urllib.request 请求模块 urllib.error 异常处理模块 urllib.parse url解析模块 urllib.robotparser robots.txt解析模块 urlopen 关于urllib.request.urlopen参数的介绍: urllib.request.urlopen(url, data=None, [timeout, ]*, cafile=None, capath=None, cadefault=False, context=None) url参数的使用 先写一个简单的例子: import urllib.request ''''' Urllib 模块提供了读取web页面数据的接口,我们可以像读取本地文件一样读取www和ftp上的数据 urlopen 方法用来打开一个url read方法 用于读取Url上的数据 ''' response = urllib.request.urlopen('http://www.baidu.com') print(response.read().decode('utf-8')) urlopen一般常用的有三个参数,它的参数如下: urllib.requeset.urlopen(url,data,timeout)

How to know if urllib.urlretrieve succeeds?

倾然丶 夕夏残阳落幕 提交于 2019-12-28 01:51:16
问题 urllib.urlretrieve returns silently even if the file doesn't exist on the remote http server, it just saves a html page to the named file. For example: urllib.urlretrieve('http://google.com/abc.jpg', 'abc.jpg') just returns silently, even if abc.jpg doesn't exist on google.com server, the generated abc.jpg is not a valid jpg file, it's actually a html page . I guess the returned headers (a httplib.HTTPMessage instance) can be used to actually tell whether the retrieval successes or not, but I

Which character produces “%3Q” from urllib.quote?

浪尽此生 提交于 2019-12-25 13:09:19
问题 We're encoding urls with user-supplied data using python's urllib.quote() ; one of the urls is producing a string with the following substring: '%3Q' I'm trying to figure out what the original user-supplied character is, but am getting stumped: I can't seem to find any resource that mentions the character that produces this output from urllib.quote() . Strangely, there are no results in stackoverflow with this string of characters. It's also important to note that the user-supplied data is

Which character produces “%3Q” from urllib.quote?

陌路散爱 提交于 2019-12-25 13:09:11
问题 We're encoding urls with user-supplied data using python's urllib.quote() ; one of the urls is producing a string with the following substring: '%3Q' I'm trying to figure out what the original user-supplied character is, but am getting stumped: I can't seem to find any resource that mentions the character that produces this output from urllib.quote() . Strangely, there are no results in stackoverflow with this string of characters. It's also important to note that the user-supplied data is

POST method in Python: errno 104

人走茶凉 提交于 2019-12-25 11:58:22
问题 I am trying to query a website in Python. I need to use a POST method (according to what is happening in my browser when I monitor it with the developer tools). If I query the website with cURL, it works well: curl -i --data "param1=var1&param2=var2" http://www.test.com I get this header: HTTP/1.1 200 OK Date: Tue, 26 Sep 2017 08:46:18 GMT Server: Apache/1.3.33 (Unix) mod_gzip/1.3.26.1a mod_fastcgi/2.4.2 PHP/4.3.11 Transfer-Encoding: chunked Content-Type: text/html But when I do it in Python

How to get round the HTTP Error 403: Forbidden with urllib.request using Python 3

前提是你 提交于 2019-12-25 09:08:42
问题 Hi not every time but sometimes when trying to gain access to the LSE code I am thrown the every annoying HTTP Error 403: Forbidden message. Anyone know how I can overcome this issue only using standard python modules (so sadly no beautiful soup). import urllib.request url = "http://www.londonstockexchange.com/exchange/prices-and-markets/stocks/indices/ftse-indices.html" infile = urllib.request.urlopen(url) # Open the URL data = infile.read().decode('ISO-8859-1') # Read the content as string

Can't “import urllib.request, urllib.parse, urllib.error”

百般思念 提交于 2019-12-25 08:29:49
问题 I trying to convert my project to python3. My server script is server.py: #!/usr/bin/env python #-*-coding:utf8-*- import http.server import os, sys server = http.server.HTTPServer handler = http.server.CGIHTTPRequestHandler server_address = ("", 8080) #handler.cgi_directories = [""] httpd = server(server_address, handler) httpd.serve_forever() But when I try: import urllib.request, urllib.parse, urllib.error I get this in terminal of python3 ./server.py: import urllib.request, urllib.parse,

python爬虫二、Urllib库的基本使用

我是研究僧i 提交于 2019-12-25 07:31:43
什么是Urllib   Urllib是python内置的HTTP请求库   包括以下模块   urllib.request 请求模块   urllib.error 异常处理模块   urllib.parse url解析模块   urllib.robotparser robots.txt解析模块 urlopen   关于urllib.request.urlopen参数的介绍:   urllib.request.urlopen(url, data=None, [timeout, ]*, cafile=None, capath=None, cadefault=False, context=None) url参数的使用 先写一个简单的例子: import urllib.request response = urllib.request.urlopen('http://www.baidu.com') print(response.read().decode('utf-8')) urlopen一般常用的有三个参数,它的参数如下: urllib.requeset.urlopen(url,data,timeout) response.read()可以获取到网页的内容,如果没有read(),将返回如下内容 data参数的使用 上述的例子是通过请求百度的get请求获得百度