urllib.request: POST data should be bytes, an iterable of bytes, or a file object

问题

I need to access an HTML website and search that website for images. It might not be that pretty, but I am able to access the website, I just need some guidance on the best way to search for the IMG's.

I tried to treat it like a file but I am getting an error saying I need to convert the data to bytes.

Let me know what you think.

    from urllib import request
    import re

    website = request.urlopen('https://www.google.com', "rb")
    html = website.read()
    hand = html.decode("UTF-8")
    for line in hand:
        line = line.rstrip()
        if re.search('^img', line):
            print(line)

TypeError: POST data should be bytes, an iterable of bytes, or a file object. It cannot be of type str

I expected to get a list of imgs

回答1:

It might not be that pretty, but I am able to access the website..

Actually, given that the error is coming from calling the function that accesses the website, you are not able to access the website.

You need to have a look at the function signature of urllib.request.urlopen().

urllib.request.urlopen(url, data=None, [timeout, ]*, cafile=None, capath=None, cadefault=False, context=None)

In this line of your code:

website = request.urlopen('https://www.google.com', "rb")

... the string 'rb' is being interpreted as the data parameter to be sent in the body of your request. This is because you have provided 2 positional arguments of which 'rb' is the second, and data is the second positional argument in the function signature.

This is what data is allowed to be:

The supported object types include bytes, file-like objects, and iterables.

So the string 'rb' is not any of those types.

But the real issue here is that you are guessing how to use the function. The open() built in function and the urllib.request.urlopen() function are very different in how they operate and as such you need to read the documentation to know how to use them properly.

Also, I'd like to suggest that unless you absolutely have to use urllib, use the requests library instead.

回答2:

The signature of function urlopen is:

urllib.request.urlopen(url, data=None, [timeout, ]*, cafile=None, capath=None, cadefault=False, context=None)

In your code, urlopen('https://www.google.com', "rb") setting "rb" string to data argument, not mode argument in another function open.

来源：https://stackoverflow.com/questions/55369233/urllib-request-post-data-should-be-bytes-an-iterable-of-bytes-or-a-file-objec

标签

python

urllib