urllib

How to use Python to retrieve xml page that requires http login?

ぐ巨炮叔叔 提交于 2019-12-08 06:56:55
问题 When I access a page on an IIS server to retrieve xml, using a query parameter through the browser (using the http in the below example) I get a pop-up login dialog for username and password (appears to be a system standard dialog/form). and once submitted the data arrives. as an xml page. How do I handle this with urllib? when I do the following, I never get prompted for a uid/psw.. I just get a traceback indicating the server (correctly ) id's me as not authorized. Using python 2.7 in

How do I translate Python urllib.request code to Java code

三世轮回 提交于 2019-12-08 05:17:30
问题 This is the python code import urllib.request as urllib2 import json data = { "Inputs": { "input1": { "ColumnNames": ["id", "regex"], "Values": [ [ "0", "the regex value" ],] }, }, "GlobalParameters": { "Database query": "select * from expone", } } body = str.encode(json.dumps(data)) url = 'https://ussouthcentral.services.azureml.net/workspaces/4729545551a741e1a2e606d37' \ 'ae61ce0/services/ac7c34ad134d43ca9fdc65e292ce35d3/execute?api-version=2.0&details=true' api_key = '8ku5P6fR3F8ykgMHK5Y8

Why can urlopen download a Google search page but not a Google Scholar search page?

五迷三道 提交于 2019-12-08 04:52:55
问题 I'm using Python 3.2.3's urllib.request module to download Google search results, but I'm getting an odd error in that urlopen works with links to Google search results, but not Google Scholar. In this example, I'm searching for "JOHN SMITH" . This code successfully prints HTML: from urllib.request import urlopen, Request from urllib.error import URLError # Google try: page_google = '''http://www.google.com/#hl=en&sclient=psy-ab&q=%22JOHN+SMITH%22&oq=%22JOHN+SMITH%22&gs_l=hp.3..0l4.129.2348.0

Downloading pdf files using mechanize and urllib

吃可爱长大的小学妹 提交于 2019-12-08 02:41:32
问题 I am new to Python, and my current task is to write a web crawler that looks for PDF files in certain webpages and downloads them. Here's my current approach (just for 1 sample url): import mechanize import urllib import sys mech = mechanize.Browser() mech.set_handle_robots(False) url = "http://www.xyz.com" try: mech.open(url, timeout = 30.0) except HTTPError, e: sys.exit("%d: %s" % (e.code, e.msg)) links = mech.links() for l in links: #Some are relative links path = str(l.base_url[:-1])+str

How to pass parameter to Url with Python urlopen

浪尽此生 提交于 2019-12-08 00:08:27
问题 I'm currently new to python programming. My problem is that my python program doesn't seem to pass/encode the parameter properly to the ASP file that I've created. This is my sample code: import urllib.request url = 'http://www.sample.com/myASP.asp' full_url = url + "?data='" + str(sentData).replace("'", '"').replace(" ", "%20").replace('"', "%22") + "'" print (full_url) response = urllib.request.urlopen(full_url) print(response) the output would give me something like: http://www.sample.com

using python urllib how to avoid non HTML content

半腔热情 提交于 2019-12-07 21:59:01
问题 I am using urllib (note not urllib2) and getting title of pages from user supplied urls. Unfortunately sometimes the url is not an HTML but some huge file or some very long running process on the remote site. I have checked the python docs but urllib is limited and looking at the source it seems I could change it but I cannot do so on the server. there is mention of info() but no example on how to implement it. I am using FancyURLopener which I guess is not available in urllib2 and I dont

Why does urllib.request.urlopen sometimes does not work, but browsers work?

依然范特西╮ 提交于 2019-12-07 21:32:34
问题 I am trying to download some content using Python's urllib.request . The following command yields an exception: import urllib.request print(urllib.request.urlopen("https://fpgroup.foreignpolicy.com/foreign-policy-releases-mayjune-spy-issue/").code) result: ... HTTPError: HTTP Error 403: Forbidden if I use firefox or links (command line browser) I get the content and a status code of 200. If I use lynx, strange enough, I also get 403. I expect all methods to work the same way successfully Why

Form Submission in Python Without Name Attribute

穿精又带淫゛_ 提交于 2019-12-07 19:57:31
问题 Background: Using urllib and urllib2 in Python, you can do a form submission. You first create a dictionary. formdictionary = { 'search' : 'stackoverflow' } Then you use urlencode method of urllib to transform this dictionary. params = urllib.urlencode(formdictionary) You can now make a url request with urllib2 and pass the variable params as a secondary parameter with the first parameter being the url. open = urllib2.urlopen('www.searchpage.com', params) From my understanding, urlencode

Python: clicking a button [duplicate]

六月ゝ 毕业季﹏ 提交于 2019-12-07 16:48:41
问题 This question already has answers here : Python: Clicking a button with urllib or urllib2 (3 answers) Closed 6 years ago . I have problems in clicking this button that looks in HTML code like this: <form method="post"> <br> <input type="hidden" value="6" name="deletetree"> <input type="submit" value="Delete Tree" name="pushed"> </form> and the url that needs to be generated looks like this: http://mysite.com/management.php?Category=2&id_user=19&deteletree=6&pushed=Delete+Tree Update: I tried

Print web page source code in python

感情迁移 提交于 2019-12-07 16:44:14
问题 I want to print a web page source code but python print command just prints empty space and I think it's because of its large size. Is there any way to print page source code in shell or at list in a file? I've tried printing in a file but this error occurred: UnicodeEncodeError: 'charmap' codec can't encode character '\u06cc' in position 11826: character maps to <undefined> How can I fix it? import urllib.request response = urllib.request.urlopen('http://www.farsnews.com') html = response