问题
This is the command I'm using on a standard web page I wget
from a web site.
tr '<' '\n<' < index.html
however it giving me newlines, but not adding the left broket in again. e.g.
echo "<hello><world>" | tr '<' '\n<'
returns
(blank line which is fine)
hello>
world>
instead of
(blank line or not)
<hello>
<world>
What's wrong?
回答1:
That's because tr
only does character-for-character substitution (or deletion).
Try sed
instead.
echo '<hello><world>' | sed -e 's/</\n&/g'
Or awk
.
echo '<hello><world>' | awk '{gsub(/</,"\n<",$0)}1'
Or perl
.
echo '<hello><world>' | perl -pe 's/</\n</g'
Or ruby
.
echo '<hello><world>' | ruby -pe '$_.gsub!(/</,"\n<")'
Or python
.
echo '<hello><world>' \
| python -c 'for l in __import__("fileinput").input():print l.replace("<","\n<")'
回答2:
If you have GNU grep
, this may work for you:
grep -Po '<.*?>[^<]*' index.html
which should pass through all of the HTML, but each tag should start at the beginning of the line with possible non-tag text following on the same line.
If you want nothing but tags:
grep -Po '<.*?>' index.html
You should know, however, that it's not a good idea to parse HTML with regexes.
回答3:
Does this work for you?
awk -F"><" -v OFS=">\n<" '{print $1,$2}'
[jaypal:~/Temp] echo "<hello><world>" | awk -F"><" -v OFS=">\n<" '{$1=$1}1';
<hello>
<world>
You can put a regex / / (lines you want this to happen for) in front of the awk
{}
action.
回答4:
The order of where you put your newline is important. Also you can escape the "<".
tr '\/<' '\/<\n' < index.html
`tr '<' '<\n' < index.html` works as well.
来源:https://stackoverflow.com/questions/8349771/unix-tr-find-and-replace