问题
I create a html form on the server side.
<form action="." method="POST">
<input type="text" name="foo" value="bar">
<textarea name="area">long text</textarea>
<select name="your-choice">
<option value="a" selected>A</option>
<option value="b">B</option>
</select>
</form>
Desired result:
{
"foo": "bar",
"area": "long text",
"your-choice": "a",
}
The method (parse_form()
) I am looking for could be used like this:
response = client.get('/foo/')
# response contains <form> ...</form>
data = parse_form(response.content)
data['my-input']='bar'
response = client.post('/foo/', data)
How to implement parse_form()
in Python?
This is not related to Django, nevertheless, there is an feature request in Django, but it was rejected several years ago: https://code.djangoproject.com/ticket/11797
回答1:
Why not just this?:
def parse_form(content):
import lxml.html
tree = lxml.html.fromstring(content)
return dict(tree.forms[0].fields)
I couldn't guess the reason for using a UserDict
One little caveat: I noticed that when the form contains a <select>, the first value is returned when no option is selected; the solution I gave above based on BS returns None instead
回答2:
This is not related to django, just to html parsing. The standard tool for that is the BeautifulSoup (bs4) library.
It parses arbitrary HTML, and is often used in web scrapers (including my own). This question covers parsing html forms: Python beautiful soup form input parsing, and pretty much everything you'll need is answered on here somewhere :)
from bs4 import BeautifulSoup
def selected_option(select):
option = select.find("option", selected=True)
if option:
return option['value']
# tag name => how to extract its value
tags = {
"input": lambda t: t['value'],
"textarea": lambda t: t.text,
"select": selected_option
}
def parse_form(html):
soup = BeautifulSoup(html, 'html.parser')
form = soup.find("form")
return {
e['name']: tags[e.name](e)
for e in form.find_all(tags.keys())
}
This gives this output for your input:
{
"foo": "bar",
"area": "long text",
"your-choice": "a"
}
For production, you are going to want to add tons of error checking, for form not found, inputs without name, etc. It depends on what exactly is needed.
回答3:
First of all, consider using response.context
instead of response.content
. As it is documented here, it gives you the template parameters that were used to render response.content
. The form attributes you need (name and value) might be in there if you gave them as parameters to the renderer.
If you must use response.content
, then I don't think Django provides a way to parse the HTML response. You can use a HTML parser like beautifulsoup, or do it using regular expressions.
回答4:
from collections import UserDict
class FormData(UserDict):
def __init__(self, *args, **kwargs):
self.frozen = False
super().__init__(*args, **kwargs)
self.frozen = True
def __setitem__(self, key, value):
if self.frozen and key not in self:
raise ValueError('Key %s is not in the dict. Available: %s' % (
key, self.keys()
))
super().__setitem__(key, value)
def parse_form(content):
"""
Parse the first form in the html in content.
"""
import lxml.html
tree = lxml.html.fromstring(content)
return FormData(tree.forms[0].fields)
Example usage:
def test_foo_form(user_client):
url = reverse('foo')
response = user_client.get(url)
assert response.status_code == 200
data = parse_form(response.content)
response = user_client.post(url, data)
assert response.status_code == 302
```
回答5:
Just for fun, I tried to replicate with BeatifulSoap the solution proposed by guettli.
Here's what I came out:
from bs4 import BeautifulSoup
def parse_form(content):
data = {}
html = BeautifulSoup(content, features="lxml")
form = html.find('form', recursive=True)
fields = form.find_all(('input', 'select', 'textarea'))
for field in fields:
name = field.get('name')
if name:
if field.name == 'input':
value = field.get('value')
elif field.name == 'select':
try:
value = field.find_all('option', selected=True)[0].get('value')
except:
value = None
elif field.name == 'textarea':
value = field.text
else:
# checkbox ? radiobutton ? file ?
continue
data[name] = value
return data
Is this a better result?
Honestly, I don't think so; on the other side, if you happen to use BS for parsing the response content in other ways, this might be an option.
来源:https://stackoverflow.com/questions/65570418/django-parse-html-containing-form-to-dictionary