add own text inside nested braces

问题

I have this source of text which contains HTML tags and PHP code at the same time:

<html>
<head>
<title><?php echo "title here"; ?></title>
<head>
	<body>
		<h1 <?php echo "class='big'" ?>>foo</h1>
	</body>
</html>

and I need place my own text (for example: MY_TEXT) after opened tag and get this result:

<html>
<head>
<title><?php echo "title here"; ?></title>
<head>
	<body>
		<h1 <?php echo "class='big'" ?>>MY_TEXTfoo</h1>
	</body>
</html>

thus I need consider nested braces

if I will use regex it will creates problems (I need consider any level of nested braces). I need another strategy.

now my idea try to use pyparsing, but I can't get it now, too complicated for my current level

could anybody make solution please?

回答1:

Pyparsing has a helper method called nestedExpr that makes it easy to match strings of nested open/close delimiters. Since you have nested PHP tags within your <h1> tag, then I would use a nestedExpr like:

nested_angle_braces = nestedExpr('<', '>')

However, this will match every tag in your input HTML source:

for match in nested_angle_braces.searchString(html):
    print match

gives:

[['html']]
[['head']]
[['title']]
[['?php', 'echo', '"title here"', ';', '?']]
[['/title']]
[['head']]
[['body']]
[['h1', ['?php', 'echo', '"class=\'big\'"', '?']]]
[['/h1']]
[['/body']]
[['/html']]

You want to match only those tags whose opening text is 'h1'. We can add a condition to an expression in pyparsing using addCondition:

nested_angle_braces_with_h1 = nested_angle_braces().addCondition(
                                            lambda tokens: tokens[0][0].lower() == 'h1')

Now we will match only the desired tag. Just a few more steps...

First of all, nestedExpr gives back nested lists of matched items. We want the original text that was matched. Pyparsing includes another helper for that, unimaginatively named originalTextFor - we combine this with the previous definition to get:

nested_angle_braces_with_h1 = originalTextFor(
    nested_angle_braces().addCondition(lambda tokens: tokens[0][0].lower() == 'h1')
    )

Lastly, we have to add one more parse-time callback action, to append "MY_TEXT" to the tag:

nested_angle_braces_with_h1.addParseAction(lambda tokens: tokens[0] + 'MY_TEXT')

Now that we are able to match the desired <h1> tag, we can use transformString method of the expression to do the search-and-replace work for us:

print(nested_angle_braces_with_h1.transformString(html))

With your original sample saved as a variable named html, we get:

<html>
<head>
<title><?php echo "title here"; ?></title>
<head>
        <body>
                <h1 <?php echo "class='big'" ?>>MY_TEXTfoo</h1>
        </body>
</html>

Note: this will add "MY_TEXT" after every <h1> tag. If you want this to be applied only after <h1> tags containing PHP, then write the appropriate condition and add it to nested_angle_braces_with_h1.

来源：https://stackoverflow.com/questions/40066293/add-own-text-inside-nested-braces

标签

python

pyparsing

python-textprocessing