问题
I am currently validating a client's HTML Source and I am getting a lot of validation errors for images and input files which do not have the Omittag. I would do it manually but this client literally has thousands of files, with a lot of instances where the is not .
This client has validated some img tags (for whatever reason).
Just wondering if there is a unix command I could run to check to see if the does not have a Omittag to add it.
I have done simple search and replaces with the following command:
find . \! -path '*.svn*' -type f -exec sed -i -n '1h;1!H;${;g;s/<b>/<strong>/g;p}' {} \;
But never something this large. Any help would be appreciated.
回答1:
See questions I asked in comment at top.
Assuming you're using GNU sed, and that you're trying to add the trailing /
to your tags to make XML-compliant <img />
and <input />
, then replace the sed expression in your command with this one, and it should do the trick: '1h;1!H;${;g;s/\(img\|input\)\( [^>]*[^/]\)>/\1\2\/>/g;p;}'
Here it is on a simple test file (SO's colorizer doing wacky things):
$ cat test.html
This is an <img tag> without closing slash.
Here is an <img tag /> with closing slash.
This is an <input tag > without closing slash.
And here one <input attrib="1"
> that spans multiple lines.
Finally one <input
attrib="1" /> with closing slash.
$ sed -n '1h;1!H;${;g;s/\(img\|input\)\( [^>]*[^/]\)>/\1\2\/>/g;p;}' test.html
This is an <img tag/> without closing slash.
Here is an <img tag /> with closing slash.
This is an <input tag /> without closing slash.
And here one <input attrib="1"
/> that spans multiple lines.
Finally one <input
attrib="1" /> with closing slash.
Here's GNU sed regex syntax and how the buffering works to do multiline search/replace.
Alternately you could use something like Tidy that's designed for sanitizing bad HTML -- that's what I'd do if I were doing anything more complicated than a couple of simple search/replaces. Tidy's options get complicated fast, so it's usually better to write a script in your scripting language of choice (Python, Perl) that calls libtidy
and sets whatever options you need.
回答2:
Try this. It'll go through your files, make a .orig
backup of each file (perl's -i
operator), and replace <img>
and <input>
tags with <img />
and <input >
.
find . \! -path '*.svn*' -type f -exec perl -pi.orig -e 's{ ( <(?:img|input)\b ([^>]*?) ) \ ?/?> }{$1\ />}sgxi' {} \;
Given input:
<img> <img/> <img src=".."> <img src="" >
<input> <input/> <input id=".."> <input id="" >
It changes the file to:
<img /> <img /> <img src=".." /> <img src="" />
<input /> <input /> <input id=".." /> <input id="" />
Here's what the regexp is doing:
s{(<(?:img|input)\b ([^>]*?)) # capture "<img" or "<input" followed by non-">" chars
\ ?/?>} # optional space, optional slash, followed by ">"
{$1\ />}sgxi # replace with: captured text, plus " />"
来源:https://stackoverflow.com/questions/242066/how-can-i-validate-large-numbers-of-files-with-search-and-replace