Extract Paragraph with specific words between two similar titiles

前端未结

关注

 2  1651

my text file contains, paragraphs something like this.

summary

A result oriented and dedicated professional with three years’ experience in Software Development


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  我在风中等你        
                
              
                            
                2021-01-23 10:25
              
            
            
                                                                       
To extract all summary sections that contain the words you are interested in:

split_on = 'summary\n\n'
must_contain = ['Project', 'Team Size']

with open('9.txt') as f_input, open('d.txt', 'w') as f_output:
    for part in f_input.read().split(split_on):
        if all(text in part for text in must_contain):
            f_output.write(split_on + part)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  说谎        
                
              
                            
                2021-01-23 10:45
              
            
            
                                                                       
The second conditional statement here will never run, as it has an identical condition to the first one. Meaning copy will always be True after the first instance of summary.

if line.strip() == 'summary':
    re.compile('\r\nproject*\r\n')
    copy = True
elif line.strip() == "summary":
    copy =False 


What I'd recommend is having one statement that picks up the "summary" tags (I assume these are meant to be start/end of comment blocks) - and toggle copy.

To toggle a boolean, you can simple set it to the inverse of itself:

 a = True
 a = not a
 # a is now False


For example:

 if line.strip() == 'summary':
    copy = not copy
 elif copy:
    outfile.write(line)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复