I\'d like to use <
grep
to find out if/where an html class is used across a bunch of files. The regex pattern should find not only
Don't do it. It will drive you insane: RegEx match open tags except XHTML self-contained tags
Instead, use a HTML parser. It's not hard.
EDIT: Here's an example in PowerShell
Get-ChildItem -Recurse *.html | where {
([xml](Get-Content $_)).SelectNodes( '//*' ) | where { $_.GetAttribute( "class" ).Contains( "foo" ) }
}
How about something like this:
grep -Erno 'class[ \t]*=[ \t]*"[^"]+"' *
That will also allow for more whitespace and should give you output similar to:
1:class="foo bar baz"
3:class = "haha"
To see all classes used, you can pipe output from the above into the following:
cut -f2 -d'"' | xargs | sort | uniq
Regular expressions are a pretty poor tool for parsing HTML. Try looking into simpleXML ( http://php.net/manual/en/book.simplexml.php ). Roll-your-own regEx on HTML is begging for trouble.
Depends what metacharacters your grep supprts, try:
'class=\"([a-z]+ ?)+\"'