Ply Lex parsing problem

后端未结

关注

 2  1722

I\'m using ply as my lex parser. My specifications are the following :

t_WHILE = r\'while\'  
t_THEN = r\'then\'  
t_ID = r\'[a-zA-Z_][a-zA-Z0-9_]*\'  
t_NUM


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  不要未来只要你来        
                
              
                            
                2021-01-18 04:51
              
            
            
                                                                       
The reason that this didn't work is related to the way ply prioritises matches of tokens, the longest token regex is tested first.  

The easiest way to prevent this problem is to match identifiers and reserved words at the same type, and select an appropriate token type based on the match. The following code is similar to an example in the ply documentation 

import ply.lex

tokens = [ 'ID', 'NUMBER', 'LESSEQUAL', 'ASSIGN' ]
reserved = {
    'while' : 'WHILE',
    'then' : 'THEN'
}
tokens += reserved.values()

t_ignore    = ' \t'
t_NUMBER    = '\d+'
t_LESSEQUAL = '\<\='
t_ASSIGN    = '\='

def t_ID(t):
    r'[a-zA-Z_][a-zA-Z0-9_]*'
    if t.value in reserved:
        t.type = reserved[ t.value ]
    return t

def t_error(t):
    print 'Illegal character'
    t.lexer.skip(1)

lexer = ply.lex.lex()
lexer.input("while n <= 0 then h = 1")
while True:
    tok = lexer.token()
    if not tok:
        break
    print tok

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  我寻月下人不归        
                
              
                            
                2021-01-18 05:10
              
            
            
                                                                       
PLY prioritizes the tokens declared as simple strings according the longest regular expression, but the tokens declared as functions have their order prioritized.

From the docs:


  When building the master regular expression, rules are added in the
  following order:
  
  
  All tokens defined by functions are added in the same order as they appear in the lexer file.
  Tokens defined by strings are added next by sorting them in order of decreasing regular expression length (longer expressions are added
  first).
  


So, an alternative solution would be simply to specify the tokens you want prioritized as functions, instead of strings, like so:

def t_WHILE(t): r'while'; return t
def t_THEN(t): r'then'; return t
t_ID = r'[a-zA-Z_][a-zA-Z0-9_]*'
t_NUMBER = r'\d+'
t_LESSEQUAL = r'<='
t_ASSIGN = r'='
t_ignore = ' \t'


This way WHILE and THEN will be the first rules to be added, and you get the behaviour you expected.

As a side note, you were using r' \t' (raw string) for t_ignore, so Python was treating the \ as a backslash. It should be a simple string instead, as in the example above.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复