urllib

How to download pdf files from URLs leading to sub-URLs using Python

江枫思渺然 提交于 2020-08-07 08:13:54
问题 I am trying to download all pdf files from the links in the following URLs: https://www.adb.org/projects/documents/country/ban/year/2020?terms=education https://www.adb.org/projects/documents/country/ban/year/2019?terms=education https://www.adb.org/projects/documents/country/ban/year/2018?terms=education These URLs have lists of links which directs to sub-links containing pdf files. The lists of links in the main URLs come from the search result of a country, year and a term. I have tried

Python - How to convert utf literal such as '\xc3\xb6' to the letter ö

天大地大妈咪最大 提交于 2020-07-10 09:35:03
问题 I am trying to convert an encoded url with german Umlaute into a string with those Umlaute. Here is an example of an encoded string = 'K%C3%B6nnen'. I would like to convert it to 'Können' When I use urllib.unquote(a) I get this returned: 'K\xc3\xb6nnen' \xc3\xb6 I found out is a utf literal. How can I convert this to an ö ? I find that if I use the print function it converts it correctly, but I cannot figure out how to get a function to return this value? Any ideas? 回答1: With decode("utf-8")

Python - How to convert utf literal such as '\xc3\xb6' to the letter ö

我与影子孤独终老i 提交于 2020-07-10 09:33:35
问题 I am trying to convert an encoded url with german Umlaute into a string with those Umlaute. Here is an example of an encoded string = 'K%C3%B6nnen'. I would like to convert it to 'Können' When I use urllib.unquote(a) I get this returned: 'K\xc3\xb6nnen' \xc3\xb6 I found out is a utf literal. How can I convert this to an ö ? I find that if I use the print function it converts it correctly, but I cannot figure out how to get a function to return this value? Any ideas? 回答1: With decode("utf-8")

How to simulate a button click in a request?

僤鯓⒐⒋嵵緔 提交于 2020-06-27 04:41:48
问题 Please do not close this question - this is not a duplicate. I need to click the button using Python requests, not Selenium, as here I am trying to scrape Reverso Context translation examples page. And I have a problem: I can get only 20 examples and then I need to click the "Display more examples" button lots of times while it exists on the page to get the full results list. It can simply be done using a web browser, but how can I do it with Python Requests library? I looked at the button's

urllib.error.URLError: <urlopen error unknown url type: https>

你。 提交于 2020-06-13 00:08:14
问题 Hello I am trying to learn web scraping. I installed Anaconda3 in Windows 10. Conda version 4.5.12. Python version 3.7.1. I wrote following script which produces the mentioned error. import bs4 from bs4 import BeautifulSoup as soup from urllib.request import urlopen as request with request('https://google.com') as response: page_html = response.read() page_soup = soup(page_html, "html.parser") print(page_soup) The error was from this line: with request('https://google.com') as response: ... .

Image Processing Error, “not enough values to unpack”

*爱你&永不变心* 提交于 2020-06-01 07:40:29
问题 The main problem was described by the post earlier. This is a sequel with full code and traceback. I have a certain error while processing photos using the telegram bot. In particular, this is a problem with a change in contrast. Full code: import telebot import os import urllib.request from PIL import Image import numpy as np TOKEN = 'here token is, just cut it out' bot = telebot.TeleBot(TOKEN) result_storage_path = 'temp' @bot.message_handler(commands=['start']) def start_message(message):

counting words inside a webpage

冷暖自知 提交于 2020-05-28 04:54:22
问题 I need to count words that are inside a webpage using python3. Which module should I use? urllib? Here is my Code: def web(): f =("urllib.request.urlopen("https://americancivilwar.com/north/lincoln.html") lu = f.read() print(lu) 回答1: With below self explained code you can get a good starting point for counting words within a web page: import requests from bs4 import BeautifulSoup from collections import Counter from string import punctuation # We get the url r = requests.get("https://en

Attribute Error:'NoneType' object has no attribute 'parent'

无人久伴 提交于 2020-05-27 11:56:47
问题 from urllib.request import urlopen from bs4 import BeautifulSoup html= urlopen("http://www.pythonscraping.com/pages/page3.html") soup= BeautifulSoup(html.read()) print(soup.find("img",{"src":"../img/gifts/img1.jpg" }).parent.previous_sibling.get_text()) The above code works fine but not the one below.It gives an attribute error as stated above. Can anyone tell me the reason? from urllib.request import urlopen from bs4 import BeautifulSoup html= urlopen("http://www.pythonscraping.com/pages

urllib.urlretrieve with custom header

可紊 提交于 2020-05-25 04:29:34
问题 I am trying to retrieve a file using urlretrieve , while adding a custom header. While checking the codesource of urllib.request I realized urlopen can take a Request object in parameter instead of just a string, allowing to put the header I want. But if I try to do the same with urlretrieve , I get a TypeError: expected string or bytes-like object as mentionned in this other post. What I ended up doing is rewriting my own urlretrieve, removing the line throwing the error (that line is