Can I use re.sub (or regexobject.sub) to replace text in a subgroup?

三世轮回 提交于 2019-12-11 14:12:51

问题


I need to parse a configuration file which looks like this (simplified):

<config>
<links>
<link name="Link1" id="1">
 <encapsulation>
  <mode>ipsec</mode>
 </encapsulation>
</link>
<link name="Link2" id="2">
 <encapsulation>
  <mode>udp</mode>
 </encapsulation>
</link>
</links>

My goal is to be able to change parameters specific to a particular link, but I'm having trouble getting substitution to work correctly. I have a regex that can isolate a parameter value on a specific link, where the value is contained in capture group 1:

link_id = r'id="1"'
parameter = 'mode'
link_regex = '<link [\w\W]+ %s>[\w\W]*[\w\W]*<%s>([\w\W]*)</%s>[\w\W]*</link>' \
% (link_id, parameter, parameter)

Thus,

print re.search(final_regex, f_read).group(1)

prints ipsec

The examples in the regex howto all seem to assume that one wants to use the capture group in the replacement, but what I need to do is replace the capture group itself (e.g. change the Link1 mode from ipsec to udp).


回答1:


not sure i'd do it that way, but the quickest way would be to shift the captures:

([\w\W][\w\W]<%s>)[\w\W]([\w\W])' and replace with group1 +mode+group2




回答2:


I have to give you the obligatory: "don't use regular expressions to do this."

Check out how very easily awesome it is to do this with BeautifulSoup, for example:

>>> from BeautifulSoup import BeautifulStoneSoup
>>> html = """
... <config>
... <links>
... <link name="Link1" id="1">
...  <encapsulation>
...   <mode>ipsec</mode>
...  </encapsulation>
... </link>
... <link name="Link2" id="2">
...  <encapsulation>
...   <mode>udp</mode>
...  </encapsulation>
... </link>
... </links>
... </config>
... """
>>> soup = BeautifulStoneSoup(html)
>>> soup.find('link', id=1)
<link name="Link1" id="1">
<encapsulation>
<mode>ipsec</mode>
</encapsulation>
</link>
>>> soup.find('link', id=1).mode.contents[0].replaceWith('whatever')
>>> soup.find('link', id=1)
<link name="Link1" id="1">
<encapsulation>
<mode>whatever</mode>
</encapsulation>
</link>

Looking at your regular expression I can't really tell if this is exactly what you wanted to do, but whatever it is you want to do, using a library like BeautifulSoup is much, much, better than trying to patch a regular expression together. I highly recommend going this route if possible.




回答3:


This looks like valid XML, in that case you don't need BeautifulSoup, definitely not the regex, just load XML using any good XML library, edit it and print it out, here is a approach using ElementTree:

import xml.etree.cElementTree as ET

s = """<config>
<links>
<link name="Link1" id="1">
 <encapsulation>
  <mode>ipsec</mode>
 </encapsulation>
</link>
<link name="Link2" id="2">
 <encapsulation>
  <mode>udp</mode>
 </encapsulation>
</link>
</links>
</config>
"""
configElement = ET.fromstring(s)

for modeElement in configElement.findall("*/*/*/mode"):
    modeElement.text = "udp"

print ET.tostring(configElement)

It will change all mode elements to udp, this is the output:

<config>
<links>
<link id="1" name="Link1">
 <encapsulation>
  <mode>udp</mode>
 </encapsulation>
</link>
<link id="2" name="Link2">
 <encapsulation>
  <mode>udp</mode>
 </encapsulation>
</link>
</links>
</config>



回答4:


Supposing that your link_regex is correct, you can add parenthesis like this:

(<link [\w\W]+ %s>[\w\W]*[\w\W]*<%s>)([\w\W]*)(</%s>[\w\W]*</link>)

and then you could do:

p = re.compile(link_regex)
replacement = 'foo'
print p.sub(r'\g<1>' + replacement + r'\g<3>' , f_read)


来源:https://stackoverflow.com/questions/886111/can-i-use-re-sub-or-regexobject-sub-to-replace-text-in-a-subgroup

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!