问题
Is there any way to define custom indent width for .prettify()
function? From what I can get from it's source -
def prettify(self, encoding=None, formatter="minimal"):
if encoding is None:
return self.decode(True, formatter=formatter)
else:
return self.encode(encoding, True, formatter=formatter)
There is no way to specify indent width. I think it's because of this line in the decode_contents()
function -
s.append(" " * (indent_level - 1))
Which has a fixed length of 1 space! (WHY!!) I tried specifying indent_level=4
, that just results in this -
<section>
<article>
<h1>
</h1>
<p>
</p>
</article>
</section>
Which looks just plain stupid. :|
Now, I can hack this away, but I just want to be sure if there is anything I'm missing. Because this should be a basic feature. :-/
If you have some better way of prettifying HTML codes, let me know.
回答1:
I actually dealt with this myself, in the hackiest way possible: by post-processing the result.
r = re.compile(r'^(\s*)', re.MULTILINE)
def prettify_2space(s, encoding=None, formatter="minimal"):
return r.sub(r'\1\1', s.prettify(encoding, formatter))
Actually, I monkeypatched prettify_2space
in place of prettify
in the class. That's not essential to the solution, but let's do it anyway, and make the indent width a parameter instead of hardcoding it to 2:
orig_prettify = bs4.BeautifulSoup.prettify
r = re.compile(r'^(\s*)', re.MULTILINE)
def prettify(self, encoding=None, formatter="minimal", indent_width=4):
return r.sub(r'\1' * indent_width, orig_prettify(self, encoding, formatter))
bs4.BeautifulSoup.prettify = prettify
So:
x = '''<section><article><h1></h1><p></p></article></section>'''
soup = bs4.BeautifulSoup(x)
print(soup.prettify(indent_width=3))
… gives:
<html>
<body>
<section>
<article>
<h1>
</h1>
<p>
</p>
</article>
</section>
</body>
</html>
Obviously if you want to patch Tag.prettify
as well as BeautifulSoup.prettify
, you have to do the same thing there. (You might want to create a generic wrapper that you can apply to both, instead of repeating yourself.) And if there are any other prettify
methods, same deal.
回答2:
As far as I can tell, this feature is not built in, as there are a handful of solutions out there for this problem.
Assuming you are using BeautifulSoup 4, here are the solutions I came up with
Hardcode it in. This requires minimal changes, this is fine if you don't need the indent to be different in different circumstances:
myTab = 4 # add this
if pretty_print:
# space = (' ' * (indent_level - 1))
space = (' ' * (indent_level - myTab))
#indent_contents = indent_level + 1
indent_contents = indent_level + myTab
Another problem with the previous solution is that the text content wont be indented entirely consistently, but attractively, still. If you need a more flexible/consistent solution, you can just modify the class.
Find the prettify function and modify it as such (it is located in the Tag class in element.py):
#Add the myTab keyword to the functions parameters (or whatever you want to call it), set it to your preferred default.
def prettify(self, encoding=None, formatter="minimal", myTab=2):
Tag.myTab= myTab # add a reference to it in the Tag class
if encoding is None:
return self.decode(True, formatter=formatter)
else:
return self.encode(encoding, True, formatter=formatter)
And then scroll up to the decode method in the Tag class and make the following changes:
if pretty_print:
#space = (' ' * (indent_level - 1))
space = (' ' * (indent_level - Tag.myTab))
#indent_contents = indent_level + Tag.myTab
indent_contents = indent_level + Tag.myTab
Then go to the decode_contents method in the Tag class and make these changes:
#s.append(" " * (indent_level - 1))
s.append(" " * (indent_level - Tag.myTab))
Now BeautifulSoup('<root><child><desc>Text</desc></child></root>').prettify(myTab=4) will return:
<root>
<child>
<desc>
Text
</desc>
</child>
</root>
**No need to patch BeautifulSoup class as it inherits the Tag class. Patching Tag class is sufficient enough to achieve the goal.
来源:https://stackoverflow.com/questions/15509397/custom-indent-width-for-beautifulsoup-prettify