find() after replaceWith() doesn't work (using BeautifulSoup)

前端未结

关注

 3  1363

Please consider the following python session:

>>> from BeautifulSoup import BeautifulSoup
>>> s = BeautifulSoup(\"This is


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  灰色年华        
                
              
                            
                2021-01-02 22:10
              
            
            
                                                                       
I think, I found a workaround, which solves the issue for me. I repeat the whole code again as a Python script to give a complete example:

from BeautifulSoup import BeautifulSoup
s = BeautifulSoup("<p>This <i>is</i> a <i>test</i>.</p>")
myi = s.find("i")
s2 = BeautifulSoup("wa<b>s</b>")
myi_id = myi.parent.contents.index(myi)
for c in reversed(s2.contents):
    myi.parent.insert(myi_id + 1, c)
myi.extract()


Please note, that this won't work without reversed(). If you skip it, you don't only change the order of the elements. If you really want the order to be changed, you will have to write the following:

for c in list(s2.contents):
    myi.parent.insert(myi_id + 1, c)


Can somebody please explain, why skipping list() will omit <b>s</b>? (Please answer in a comment, because this is not the main question here.)
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  长情又很酷        
                
              
                            
                2021-01-02 22:11
              
            
            
                                                                       
Simpler answer : after your call to replaceWith, regenerate and clean s by calling s = BeautifulSoup(s.renderContents()). Then you can find again.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  感动是毒        
                
              
                            
                2021-01-02 22:12
              
            
            
                                                                       
The problem seems to be that a BeautifulSoup object is considered an entire document.  find iterates through the document asking each element for the next element after it.  But when it gets to your BeautifulSoup("was"), that object thinks it is the whole document, so it says there is nothing after it.  This aborts the search too early.

I don't think BeautifulSoup is designed to have BeautifulSoup objects inside other BeautifulSoup objects.  The workaround is don't do that.  Why do you feel you need to use the first form instead of the second one, which already works?  If you want to replace an element with some bit of HTML, use a Tag for your replacement, not a BeautifulSoup object.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复