urllib.request: POST data should be bytes, an iterable of bytes, or a file object

浪子不回头ぞ 提交于 2020-01-03 02:47:28

问题


I need to access an HTML website and search that website for images. It might not be that pretty, but I am able to access the website, I just need some guidance on the best way to search for the IMG's.

I tried to treat it like a file but I am getting an error saying I need to convert the data to bytes.

Let me know what you think.

    from urllib import request
    import re

    website = request.urlopen('https://www.google.com', "rb")
    html = website.read()
    hand = html.decode("UTF-8")
    for line in hand:
        line = line.rstrip()
        if re.search('^img', line):
            print(line)

TypeError: POST data should be bytes, an iterable of bytes, or a file object. It cannot be of type str

I expected to get a list of imgs


回答1:


It might not be that pretty, but I am able to access the website..

Actually, given that the error is coming from calling the function that accesses the website, you are not able to access the website.

You need to have a look at the function signature of urllib.request.urlopen().

urllib.request.urlopen(url, data=None, [timeout, ]*, cafile=None, capath=None, cadefault=False, context=None)

In this line of your code:

website = request.urlopen('https://www.google.com', "rb")

... the string 'rb' is being interpreted as the data parameter to be sent in the body of your request. This is because you have provided 2 positional arguments of which 'rb' is the second, and data is the second positional argument in the function signature.

This is what data is allowed to be:

The supported object types include bytes, file-like objects, and iterables.

So the string 'rb' is not any of those types.

But the real issue here is that you are guessing how to use the function. The open() built in function and the urllib.request.urlopen() function are very different in how they operate and as such you need to read the documentation to know how to use them properly.

Also, I'd like to suggest that unless you absolutely have to use urllib, use the requests library instead.




回答2:


The signature of function urlopen is:

urllib.request.urlopen(url, data=None, [timeout, ]*, cafile=None, capath=None, cadefault=False, context=None)

In your code, urlopen('https://www.google.com', "rb") setting "rb" string to data argument, not mode argument in another function open.



来源:https://stackoverflow.com/questions/55369233/urllib-request-post-data-should-be-bytes-an-iterable-of-bytes-or-a-file-objec

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!