Parse multipart/form-data using cgi.FieldStorage; None keys

霸气de小男生 提交于 2020-01-01 09:22:51

问题


The following piece of code should be able to run in Python 2.7 and Python 3.x.

from __future__ import unicode_literals
from __future__ import print_function

import cgi
try:
    from StringIO import StringIO as IO
except ImportError:
    from io import BytesIO as IO

body = """
--spam
Content-Disposition: form-data; name="param1"; filename=blob
Content-Type: binary/octet-stream

value1
--spam--
"""

parsed = cgi.FieldStorage(
    IO(body.encode('utf-8')),
    headers={'content-type': 'multipart/form-data; boundary=spam'},
    environ={'REQUEST_METHOD': 'POST'})

print([key for key in parsed])

In Python 2.7 it runs fine and it outputs ['param1']. In Python 3.4 however, it outputs [None].

I cannot get FieldStorage to get a usable result in Python 3. I suspect something internally changed and I'm now using it wrong. However I can't seem to figure out what. Any help is appreciated.


回答1:


These changes will make your script work identically in both Python 2.7.x and 3.4.x:

(I will use these abbreviations for cgi.FieldStorage(): Python 2.7.x: FS27, Python 3.4.x: FS34)

1 - While FS27 handles the newline before the boundary correctly, that is not the case with FS34 so the solution is to start with your boundary(spam) directly.

body = """--spam
Content-Disposition: form-data; name="param1"; filename=blob
Content-type: binary/octet-stream

value1
--spam--
"""

2 - Quoting from cgi.py source (in FS34's definition comments):

Arguments, all optional:

fp : file pointer; default: sys.stdin.buffer (not used when the request method is GET)

        Can be :
        1. a TextIOWrapper object
        2. an object whose read() and readline() methods return bytes

The grey part is not present in FS27 definition, so, most of the differences between FS27 and FS34 lie in the handling of strings(FS27) and binary streams(FS34).

In this context, FS34 can easily mess the semantics of the parsed object, unless it is given proper directions on how to handle this correctly. Apparently, the headers dictionary entry 'content-type': 'multipart/form-data; boundary=spam' is not enough; you have to supply the message length information.

You can achieve this, effectively, by adding a second entry in headers:

headers={'content-type': 'multipart/form-data; boundary=spam;',
'content-length': len(body)}

where the value for the content-length key is the body length (including the start/end boundaries).


These modifications, combined, lead to the desired result:

$ python script.py
['param1']
$ python3 script.py
['param1']

As proof-of-concept, these are the returned parsed objects from both FS27 and FS34:

...
print(parsed)
...

yields:

FieldStorage(None, None, [FieldStorage('param1', 'blob', 'value1')])

for FS27, and

FieldStorage(None, None, [FieldStorage('param1', 'blob', b'value1')])

for FS34.




回答2:


In both Python 2.7 and Python 3.5 (not working in Python 3.4 for some reason), the desired output is returned by adding Content-Length to the response body:

body = """
--spam
Content-Disposition: form-data; name="param1"; filename=blob
Content-Length: 6
Content-Type: binary/octet-stream

value1
--spam--
"""


来源:https://stackoverflow.com/questions/32889703/parse-multipart-form-data-using-cgi-fieldstorage-none-keys

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!