How to remove XML tags from Unix command line?

后端未结

关注

 5  1029

I am grepping an XML File, which gives me output like this:

data
more data
...

Note, this is a fl

相关标签:

5条回答

说谎

2021-01-31 19:10
Use html2text command-line tool, which converts html into plain text.

Alternatively you may try ex-way:
```
ex -s +'%s/<[^>].\{-}>//ge' +%p +q! file.txt
```
or:
```
cat file.txt | ex -s +'%s/<[^>].\{-}>//ge' +%p +q! /dev/stdin
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
悲哀的现实

2021-01-31 19:12
Give this a try:
```
grep -Po '<.*?>\K.*?(?=<.*?>)' inputfile
```
Explanation:

Using Perl Compatible Regular Expressions (-P) and outputting only the specified matches (-o):
- <.*?> - Non-greedy match of any characters within angle brackets
- \K - Don't include the preceding match in the output (reset match start - similar to positive look-behind, but it works with variable-length matches)
- .*? - Non-greedy match stopping at the next match (this part will be output)
- (?=<.*?>) - Non-greedy match of any characters within angle brackets and don't include the match in the output (positive look-ahead - works with variable-length matches)
0 讨论(0)
发布评论:

提交评论
- 加载中...
[愿得一人]

2021-01-31 19:15
I know this is not a "perlgolf contest", but I used to use this trick.

Set Record Separator for < or >, then print only odd lines:
```
awk -vRS='<|>' NR%2 file.xml
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
臣服心动

2021-01-31 19:17
Using awk:
```
awk '{gsub(/<[^>]*>/,"")};1' file.xml
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
有刺的猬

2021-01-31 19:20
If your file looks just like that, then sed can help you:
```
sed -e 's/<[^>]*>//g' file.xml
```
Of course you should not use regular expressions for parsing XML because it's hard.
0 讨论(0)
发布评论:

提交评论
- 加载中...