urllib | 易学教程

Patentsview API Python 3.4

阅读更多关于 Patentsview API Python 3.4

问题 I am beginner in python, currently working on a small project with Python. I want to build a dynamic script for patent research for patentsview.org. Here is my code: import urllib.parse import urllib.request #http://www.patentsview.org/api/patents/query?q={"_and": [{"inventor_last_name":author},{"_text_any":{"patent_title":[title]}}]}&o= {"matched_subentities_only": "true"} author = "Jobs" andreq = "_and" invln = "inventor_last_name" text = "_text_any" patent = "patent_title" match = "matched

I/O error(socket error): [Errno 111] Connection refused

阅读更多关于 I/O error(socket error): [Errno 111] Connection refused

问题 I have a program that uses urllib to periodically fetch a url, and I see intermittent errors like : I/O error(socket error): [Errno 111] Connection refused. It works 90% of the time, but the othe r10% it fails. If retry the fetch immediately after it fails, it succeeds. I'm unable to figure out why this is so. I tried to see if any ports are available, and they are. Any debugging ideas? For additional info, the stack trace is: File "/usr/lib/python2.6/urllib.py", line 203, in open return

Get file size from “Content-Length” value from a file in python 3.2

阅读更多关于 Get file size from “Content-Length” value from a file in python 3.2

问题 I want to get the Content-Length value from the meta variable. I need to get the size of the file that I want to download. But the last line returns an error, HTTPMessage object has no attribute getheaders . import urllib.request import http.client #----HTTP HANDLING PART---- url = "http://client.akamai.com/install/test-objects/10MB.bin" file_name = url.split('/')[-1] d = urllib.request.urlopen(url) f = open(file_name, 'wb') #----GET FILE SIZE---- meta = d.info() print ("Download Details",

QPX Express API from Python

阅读更多关于 QPX Express API from Python

I am trying to use Google's QPX Express API from python. I keep running into a pair of issues in sending the request. At first what I tried is this: url = "https://www.googleapis.com/qpxExpress/v1/trips/search?key=MY_KEY_HERE" values = {"request": {"passengers": {"kind": "qpxexpress#passengerCounts", "adultCount": 1}, "slice": [{"kind": "qpxexpress#sliceInput", "origin": "RDU", "destination": location, "date": dateGo}]}} data = json.dumps(values) req = urllib2.Request(url, data, {'Content-Type': 'application/json'}) f = urllib2.urlopen(req) response = f.read() f.close() print(response) based

Why urllib.urlopen.read() does not correspond to source code?

阅读更多关于 Why urllib.urlopen.read() does not correspond to source code?

问题 I'm trying to fetch the following webpage: import urllib urllib.urlopen("http://www.gallimard-jeunesse.fr/searchjeunesse/advanced/(order)/author?catalog[0]=1&SearchAction=1").read() The result does not correspond to what I see when inspecting the source code of the webpage using Google Chrome for example. Could you tell me why this happens and how I could improve my code to overcome the problem? Thank you for your help. 回答1: What you are getting from urlopen is the raw webpage meaning no

A multi-part/threaded downloader via python?

阅读更多关于 A multi-part/threaded downloader via python?

问题 I've seen a few threaded downloaders online, and even a few multi-part downloaders (HTTP). I haven't seen them together as a class/function. If any of you have a class/function lying around, that I can just drop into any of my applications where I need to grab multiple files, I'd be much obliged. If there is there a library/framework (or a program's back-end) that does this, please direct me towards it? 回答1: Threadpool by Christopher Arndt may be what you're looking for. I've used this "easy

“urllib.error.HTTPError: HTTP Error 404: Not Found” Python

阅读更多关于 “urllib.error.HTTPError: HTTP Error 404: Not Found” Python

问题 I'm trying to open this webpage with the urllib.request.open function: "https://prenotaonline.esteri.it/login.aspx?cidsede=100001&returnUrl=//" I can access this webpage with my regular browser, still with the urrlib.request.open function it returns HTTP error 404: import urllib.request page = urllib.request.urlopen("https://prenotaonline.esteri.it/login.aspx?cidsede=100001&returnUrl=//").read() print(page) I get the following error: Traceback (most recent call last): File "/Users/markmouawad

Scraping a list of urls

阅读更多关于 Scraping a list of urls

问题 I am using Python 3.5 and trying to scrape a list of urls (from the same website), code as follows: import urllib.request from bs4 import BeautifulSoup url_list = ['URL1', 'URL2','URL3] def soup(): for url in url_list: sauce = urllib.request.urlopen(url) for things in sauce: soup_maker = BeautifulSoup(things, 'html.parser') return soup_maker # Scraping def getPropNames(): for propName in soup.findAll('div', class_="property-cta"): for h1 in propName.findAll('h1'): print(h1.text) def getPrice(

Using urllib and minidom to fetch XML data

阅读更多关于 Using urllib and minidom to fetch XML data

问题 I'm trying to fetch data from a XML service... this one. http://xmlweather.vedur.is/?op_w=xml&type=forec&lang=is&view=xml&ids=1 I'm using urrlib and minidom and i can't seem to make it work. I've used minidom with files and not url. This is the code im trying to use xmlurl = 'http://xmlweather.vedur.is' xmlpath = xmlurl + '?op_w=xml&type=forec&lang=is&view=xml&ids=' + str(location) xmldoc = minidom.parse(urllib.urlopen(xmlpath)) Can anyone help me? 回答1: The following should work (or at least

html5lib makes BeautifulSoup miss an element

阅读更多关于 html5lib makes BeautifulSoup miss an element

问题 Contiuing my attempt to pull transcripts from the Presidential debates, I've no started using html5lib as a parser with BeautifulSoup. But, now when I run (previously working) code to find the element with the actual transcript it errors out and claims not to find any such span. Here's the code: from bs4 import BeautifulSoup import html5lib import urllib file = urllib.urlopen('http://www.presidency.ucsb.edu/ws/index.php?pid=111395') soup = BeautifulSoup(file, "html5lib") transcript = soup