How to remove tags from a string in python using regular expressions? (NOT in HTML)

前端未结

关注

 6  1125

I need to remove tags from a string in python.

Title

What is the most effici

相关标签:

6条回答

情书的邮戳

2020-12-08 00:06
Please avoid using regex. Eventhough regex will work on your simple string, but you'd get problem in the future if you get a complex one.

You can use BeautifulSoup get_text() feature.
```
from bs4 import BeautifulSoup

text = '<FNT name="Century Schoolbook" size="22">Title</FNT>'
soup = BeautifulSoup(text)

print(soup.get_text())
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
清酒与你

2020-12-08 00:07
Searching this regex and replacing it with an empty string should work.
```
/<[A-Za-z\/][^>]*>/
```
Example (from python shell):
```
>>> import re
>>> my_string = '<FNT name="Century Schoolbook" size="22">Title</FNT>'
>>> print re.sub('<[A-Za-z\/][^>]*>', '', my_string)
Title
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
不要未来只要你来

2020-12-08 00:11
This should work:
```
import re
re.sub('<[^>]*>', '', mystring)
```
To everyone saying that regexes are not the correct tool for the job:

The context of the problem is such that all the objections regarding regular/context-free languages are invalid. His language essentially consists of three entities: a = <, b = >, and c = [^><]+. He wants to remove any occurrences of acb. This fairly directly characterizes his problem as one involving a context-free grammar, and it is not much harder to characterize it as a regular one.

I know everyone likes the "you can't parse HTML with regular expressions" answer, but the OP doesn't want to parse it, he just wants to perform a simple transformation.
0 讨论(0)
发布评论:

提交评论
- 加载中...
暗喜

2020-12-08 00:14

If it's only for parsing and retrieving value, you might take a look at BeautifulStoneSoup.

0 讨论(0)
发布评论:

提交评论
- 加载中...
慢半拍i

2020-12-08 00:21
If the source text is well-formed XML, you can use the stdlib module ElementTree:
```
import xml.etree.ElementTree as ET
mystring = """<FNT name="Century Schoolbook" size="22">Title</FNT>"""
element = ET.XML(mystring)
print element.text  # 'Title'
```
If the source isn't well-formed, BeautifulSoup is a good suggestion. Using regular expressions to parse tags is not a good idea, as several posters have pointed out.
0 讨论(0)
发布评论:

提交评论
- 加载中...
广开言路

2020-12-08 00:23

Use an XML parser, such as ElementTree. Regular expressions are not the right tool for this job.

0 讨论(0)
发布评论:

提交评论
- 加载中...