问题
I'm writing a simple webserver in python that allows a user to upload a file using multipart/form-data. As far as I can tell, multipart MIME data is supposed to be line based. For instance, the boundary has to be at the beginning of a line.
I can't figure out how binary data is handled in this regard. My client (Firefox) is not encoding it into 7bit ASCII or anything, it's just raw binary data it's sending. Does it split the data into lines at arbitrary locations? Is there a maximum line length specified for multipart data? I've tried looking through the RFC for multipart/form-data, but didn't find anything.
回答1:
After digging through the RFCs, I think I finally got it all straight in my head. The body parts (i.e., the body content of an individual part in a multipart/*
message) only need to be line based in that the boundary at the end of the part begins with a CR+LF
. But otherwise, the data need not be line-based, and if the content happens to have linebreaks in it, there is no maximum distance between them, nor do they need to be escaped in anyway (well, unless perhaps the Content-Transfer-Encoding
is quoted-string). The 7-bit, 8-bit, and binary options for Content-Transfer-Encoding
don't actually indicate that any encoding has been done on the data (and therefore no encoding needs to be undone), they're just meant to indicate the type of data you can expect to see in the body part.
What I was really getting at in my [poorly expressed] question was how to read/buffer the data from the socket so that I could make sure I caught the boundary, and without having to have an arbitrarily large buffer (e.g., if there happened to be no linebreaks in the content, and so a readline
ended up buffering the entire thing).
What I ended up doing was buffering from the socket with a readline
using a maximum length, so the buffer would never be longer than that, but would also make sure to terminate if a linebreak was encountered. This ensured that when the boundary came (following a CR+LF
), it would be at the beginning of the buffer. I had to do a little extra monkeying around to ensure I didn't include that final CR+LF
in the actual body content, because according to the RFC it's required before the boundary, and therefore not part of the content itself.
回答2:
Try reviewing RFC 2045. Typically, binary content is converted into BASE64 by your application and included in the multi part message using "Content-Transfer-Encoding : Base64". There other mechanisms to transfer binary data, but this is quite common. Binary data are converted into octets and chunked out in arbitary length strings (depending on the encoding variant - see the BASE64 link above). The receiving application then decodes it into the original binary content.
I am not a python programmer, but I would be surprised it you really had to code any of this yourself. I suspect there are prebuilt python library functions to do this for you.
来源:https://stackoverflow.com/questions/15664712/binary-lines-in-multipart-form-data-file-upload