Regular expression for finding class names in HTML

前端 未结 4 688
我在风中等你
我在风中等你 2021-01-14 09:40

I\'d like to use grep to find out if/where an html class is used across a bunch of files. The regex pattern should find not only

<

相关标签:
4条回答
  • 2021-01-14 10:14

    Don't do it. It will drive you insane: RegEx match open tags except XHTML self-contained tags

    Instead, use a HTML parser. It's not hard.

    EDIT: Here's an example in PowerShell

    Get-ChildItem -Recurse *.html | where { 
        ([xml](Get-Content $_)).SelectNodes( '//*' ) | where { $_.GetAttribute( "class" ).Contains( "foo" ) } 
    }
    
    0 讨论(0)
  • 2021-01-14 10:17

    How about something like this:

    grep -Erno 'class[ \t]*=[ \t]*"[^"]+"' *
    

    That will also allow for more whitespace and should give you output similar to:

    1:class="foo bar baz"
    3:class = "haha"
    

    To see all classes used, you can pipe output from the above into the following:

    cut -f2 -d'"' | xargs | sort | uniq
    
    0 讨论(0)
  • 2021-01-14 10:21

    Regular expressions are a pretty poor tool for parsing HTML. Try looking into simpleXML ( http://php.net/manual/en/book.simplexml.php ). Roll-your-own regEx on HTML is begging for trouble.

    0 讨论(0)
  • Depends what metacharacters your grep supprts, try:

    'class=\"([a-z]+ ?)+\"'

    0 讨论(0)
提交回复
热议问题