Regular expression for finding class names in HTML

前端未结

关注

 4  692

I\'d like to use grep to find out if/where an html class is used across a bunch of files. The regex pattern should find not only

<


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  甜味超标        
                
              
                            
                2021-01-14 10:14
              
            
            
                                                                       
Don't do it.  It will drive you insane: RegEx match open tags except XHTML self-contained tags

Instead, use a HTML parser. It's not hard.

EDIT: Here's an example in PowerShell

Get-ChildItem -Recurse *.html | where { 
    ([xml](Get-Content $_)).SelectNodes( '//*' ) | where { $_.GetAttribute( "class" ).Contains( "foo" ) } 
}

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  醉梦人生        
                
              
                            
                2021-01-14 10:17
              
            
            
                                                                       
How about something like this:

grep -Erno 'class[ \t]*=[ \t]*"[^"]+"' *


That will also allow for more whitespace and should give you output similar to:

1:class="foo bar baz"
3:class = "haha"


To see all classes used, you can pipe output from the above into the following:

cut -f2 -d'"' | xargs | sort | uniq

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  春和景丽        
                
              
                            
                2021-01-14 10:21
              
            
            
                                                                       
Regular expressions are a pretty poor tool for parsing HTML. Try looking into simpleXML ( http://php.net/manual/en/book.simplexml.php ). Roll-your-own regEx on HTML is begging for trouble.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  不要未来只要你来        
                
              
                            
                2021-01-14 10:25
              
            
            
                                                                       
Depends what metacharacters your grep supprts, try:

'class=\"([a-z]+ ?)+\"'
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复