Processing repeatedly structured text file with python

后端未结

关注

 3  424

I have a big text file structured in blocks like:

Student = {
        PInfo = {
                ID   = 0001;
            Name.First = \"Joe\";
            Na


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  梦如初夏        
                
              
                            
                2020-12-10 10:02
              
            
            
                                                                       
To parse the file you could define a grammar that describes your input format and use it to generate a parser.

There are many language parsers in Python. For example, you could use Grako that takes grammars in a variation of EBNF as input, and outputs memoizing PEG parsers in Python.

To install Grako, run pip install grako.

Here's grammar for your format using Grako's flavor of EBNF syntax:

(* a file is zero or more records *)
file = { record }* $;
record = name '=' value ';' ;
name = /[A-Z][a-zA-Z0-9.]*/ ;
value = object | integer | string ;
(* an object contains one or more records *)
object = '{' { record }+ '}' ;
integer = /[0-9]+/ ;
string = '"' /[^"]*/ '"';


To generate parser, save the grammar to a file e.g., Structured.ebnf and run:

$ grako -o structured_parser.py Structured.ebnf


It creates structured_parser module that can be used to extract the student information from the input:

#!/usr/bin/env python
from structured_parser import StructuredParser

class Semantics(object):
    def record(self, ast):
        # record = name '=' value ';' ;
        # value = object | integer | string ;
        return ast[0], ast[2] # name, value
    def object(self, ast):
        # object = '{' { record }+ '}' ;
        return dict(ast[1])
    def integer(self, ast):
        # integer = /[0-9]+/ ;
        return int(ast)
    def string(self, ast):
        # string = '"' /[^"]*/ '"';
        return ast[1]

with open('input.txt') as file:
    text = file.read()
parser = StructuredParser()
ast = parser.parse(text, rule_name='file', semantics=Semantics())
students = [value for name, value in ast if name == 'Student']
d = {'{0[Name.First]} {0[Name.Last]}'.format(s['PInfo']):
     dict(School=s['School'], Zip=s['Address']['Zip'])
     for s in students}
from pprint import pprint
pprint(d)


Output

{'Joe Burger': {'School': u'West High', 'Zip': 12345},
 'John Smith': {'School': u'East High', 'Zip': 12346}}

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  名媛妹妹        
                
              
                            
                2020-12-10 10:08
              
            
            
                                                                       
For such thing, I use Marpa::R2, a Perl interface to Marpa, a general BNF parser. It allows decribing the text as a grammar rules and parse them to a tree of arrays (parse tree). You can then traverse the tree to save the results as a hash of hashes (hash is perl for python's dictionary) or use it as is.

I cooked a working example using your input: parser, 
result tree.

Hope this helps.

P.S. Example of ast_traverse(): Parse values from a block of text based on specific keys
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  一整个雨季        
                
              
                            
                2020-12-10 10:25
              
            
            
                                                                       
it's not json, but similar structured. you should be able to reformat it into json.


"=" -> ":"
quote all keys with '"'
";" -> ","
remove all "," which are followed by a "}"
put it in curly braces
parse it with json.loads

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复