urllib

Patentsview API Python 3.4

送分小仙女□ 提交于 2019-12-09 22:42:12
问题 I am beginner in python, currently working on a small project with Python. I want to build a dynamic script for patent research for patentsview.org. Here is my code: import urllib.parse import urllib.request #http://www.patentsview.org/api/patents/query?q={"_and": [{"inventor_last_name":author},{"_text_any":{"patent_title":[title]}}]}&o= {"matched_subentities_only": "true"} author = "Jobs" andreq = "_and" invln = "inventor_last_name" text = "_text_any" patent = "patent_title" match = "matched

I/O error(socket error): [Errno 111] Connection refused

一个人想着一个人 提交于 2019-12-09 14:48:07
问题 I have a program that uses urllib to periodically fetch a url, and I see intermittent errors like : I/O error(socket error): [Errno 111] Connection refused. It works 90% of the time, but the othe r10% it fails. If retry the fetch immediately after it fails, it succeeds. I'm unable to figure out why this is so. I tried to see if any ports are available, and they are. Any debugging ideas? For additional info, the stack trace is: File "/usr/lib/python2.6/urllib.py", line 203, in open return

Get file size from “Content-Length” value from a file in python 3.2

旧时模样 提交于 2019-12-09 11:06:03
问题 I want to get the Content-Length value from the meta variable. I need to get the size of the file that I want to download. But the last line returns an error, HTTPMessage object has no attribute getheaders . import urllib.request import http.client #----HTTP HANDLING PART---- url = "http://client.akamai.com/install/test-objects/10MB.bin" file_name = url.split('/')[-1] d = urllib.request.urlopen(url) f = open(file_name, 'wb') #----GET FILE SIZE---- meta = d.info() print ("Download Details",

QPX Express API from Python

纵饮孤独 提交于 2019-12-09 03:09:27
I am trying to use Google's QPX Express API from python. I keep running into a pair of issues in sending the request. At first what I tried is this: url = "https://www.googleapis.com/qpxExpress/v1/trips/search?key=MY_KEY_HERE" values = {"request": {"passengers": {"kind": "qpxexpress#passengerCounts", "adultCount": 1}, "slice": [{"kind": "qpxexpress#sliceInput", "origin": "RDU", "destination": location, "date": dateGo}]}} data = json.dumps(values) req = urllib2.Request(url, data, {'Content-Type': 'application/json'}) f = urllib2.urlopen(req) response = f.read() f.close() print(response) based

Why urllib.urlopen.read() does not correspond to source code?

烈酒焚心 提交于 2019-12-09 02:35:02
问题 I'm trying to fetch the following webpage: import urllib urllib.urlopen("http://www.gallimard-jeunesse.fr/searchjeunesse/advanced/(order)/author?catalog[0]=1&SearchAction=1").read() The result does not correspond to what I see when inspecting the source code of the webpage using Google Chrome for example. Could you tell me why this happens and how I could improve my code to overcome the problem? Thank you for your help. 回答1: What you are getting from urlopen is the raw webpage meaning no

A multi-part/threaded downloader via python?

╄→尐↘猪︶ㄣ 提交于 2019-12-08 13:48:20
问题 I've seen a few threaded downloaders online, and even a few multi-part downloaders (HTTP). I haven't seen them together as a class/function. If any of you have a class/function lying around, that I can just drop into any of my applications where I need to grab multiple files, I'd be much obliged. If there is there a library/framework (or a program's back-end) that does this, please direct me towards it? 回答1: Threadpool by Christopher Arndt may be what you're looking for. I've used this "easy

“urllib.error.HTTPError: HTTP Error 404: Not Found” Python

£可爱£侵袭症+ 提交于 2019-12-08 12:26:53
问题 I'm trying to open this webpage with the urllib.request.open function: "https://prenotaonline.esteri.it/login.aspx?cidsede=100001&returnUrl=//" I can access this webpage with my regular browser, still with the urrlib.request.open function it returns HTTP error 404: import urllib.request page = urllib.request.urlopen("https://prenotaonline.esteri.it/login.aspx?cidsede=100001&returnUrl=//").read() print(page) I get the following error: Traceback (most recent call last): File "/Users/markmouawad

Scraping a list of urls

邮差的信 提交于 2019-12-08 09:23:17
问题 I am using Python 3.5 and trying to scrape a list of urls (from the same website), code as follows: import urllib.request from bs4 import BeautifulSoup url_list = ['URL1', 'URL2','URL3] def soup(): for url in url_list: sauce = urllib.request.urlopen(url) for things in sauce: soup_maker = BeautifulSoup(things, 'html.parser') return soup_maker # Scraping def getPropNames(): for propName in soup.findAll('div', class_="property-cta"): for h1 in propName.findAll('h1'): print(h1.text) def getPrice(

Using urllib and minidom to fetch XML data

旧时模样 提交于 2019-12-08 09:18:16
问题 I'm trying to fetch data from a XML service... this one. http://xmlweather.vedur.is/?op_w=xml&type=forec&lang=is&view=xml&ids=1 I'm using urrlib and minidom and i can't seem to make it work. I've used minidom with files and not url. This is the code im trying to use xmlurl = 'http://xmlweather.vedur.is' xmlpath = xmlurl + '?op_w=xml&type=forec&lang=is&view=xml&ids=' + str(location) xmldoc = minidom.parse(urllib.urlopen(xmlpath)) Can anyone help me? 回答1: The following should work (or at least

html5lib makes BeautifulSoup miss an element

倖福魔咒の 提交于 2019-12-08 08:43:02
问题 Contiuing my attempt to pull transcripts from the Presidential debates, I've no started using html5lib as a parser with BeautifulSoup. But, now when I run (previously working) code to find the element with the actual transcript it errors out and claims not to find any such span. Here's the code: from bs4 import BeautifulSoup import html5lib import urllib file = urllib.urlopen('http://www.presidency.ucsb.edu/ws/index.php?pid=111395') soup = BeautifulSoup(file, "html5lib") transcript = soup