How to read html from a url in python 3

后端未结

关注

 6  1893

滥情空心

I looked at previous similar questions and got only more confused.

In python 3.4, I want to read an html page as a string, given the url.

In perl I do this w

相关标签:

6条回答

我寻月下人不归

2020-12-08 02:23
```
import requests

url = requests.get("http://yahoo.com")
htmltext = url.text
print(htmltext)
```
This will work similar to urllib.urlopen.
0 讨论(0)
发布评论:

提交评论
- 加载中...
广开言路

2020-12-08 02:24
Try the 'requests' module, it's much simpler.
```
#pip install requests for installation

import requests

url = 'https://www.google.com/'
r = requests.get(url)
r.text
```
more info here > http://docs.python-requests.org/en/master/
0 讨论(0)
发布评论:

提交评论
- 加载中...

深忆病人

2020-12-08 02:28

For python 2

import urllib
some_url = 'https://docs.python.org/2/library/urllib.html'
filehandle = urllib.urlopen(some_url)
print filehandle.read()

0 讨论(0)

花落未央

2020-12-08 02:33
Note that Python3 does not read the html code as a string but as a bytearray, so you need to convert it to one with decode.
```
import urllib.request

fp = urllib.request.urlopen("http://www.python.org")
mybytes = fp.read()

mystr = mybytes.decode("utf8")
fp.close()

print(mystr)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
故里飘歌

2020-12-08 02:35
Reading an html page with urllib is fairly simple to do. Since you want to read it as a single string I will show you.

Import urllib.request:
```
#!/usr/bin/python3.5

import urllib.request
```
Prepare our request
```
request = urllib.request.Request('http://www.w3schools.com')
```
Always use a "try/except" when requesting a web page as things can easily go wrong. urlopen() requests the page.
```
try:
    response = urllib.request.urlopen(request)
except:
    print("something wrong")
```
Type is a great function that will tell us what 'type' a variable is. Here, response is a http.response object.
```
print(type(response))
```
The read function for our response object will store the html as bytes to our variable. Again type() will verify this.
```
htmlBytes = response.read()

print(type(htmlBytes))
```
Now we use the decode function for our bytes variable to get a single string.
```
htmlStr = htmlBytes.decode("utf8")

print(type(htmlStr))
```
If you do want to split up this string into separate lines, you can do so with the split() function. In this form we can easily iterate through to print out the entire page or do any other processing.
```
htmlSplit = htmlStr.split('\n')

print(type(htmlSplit))

for line in htmlSplit:
    print(line)
```
Hopefully this provides a little more detailed of an answer. Python documentation and tutorials are great, I would use that as a reference because it will answer most questions you might have.
0 讨论(0)
发布评论:

提交评论
- 加载中...
暖寄归人

2020-12-08 02:38

urllib.request.urlopen(url).read() should return you the raw HTML page as a string.

0 讨论(0)
发布评论:

提交评论
- 加载中...