Python + BeautifulSoup: How to get wrapper out of HTML based on text?

前端 未结 1 1924
刺人心
刺人心 2021-01-27 10:01

Would like to get the wrapper of a key text. For example, in HTML:

chicken
apple
1条回答
  •  南方客
    南方客 (楼主)
    2021-01-27 10:25

    # coding: utf-8
    
    html_doc = """
    
    
      
        
         Last chicken leg on stock! Only 500$ !!! 
      
      
        

    My chicken has ONE leg :P

    eat me
    """ from bs4 import BeautifulSoup as BS import re soup = BS(html_doc, "lxml") # (tag -> text) direction is pretty obvious that way tag = soup.find('div', class_="chicken") tag2 = soup.find('div', {'id':"chicken_surname"}) print('\n###### by_cls:') print(tag) print('\n###### by_id:') print(tag2) # but can be tricky when need to find tag by substring tag_by_str = soup.find(string="eat me") tag_by_sub = soup.find(string="eat") tag_by_resub = soup.find(string=re.compile("eat")) print('\n###### tag_by_str:') print(tag_by_str) print('\n###### tag_by_sub:') print(tag_by_sub) print('\n###### tag_by_resub:') print(tag_by_resub) # there are more than one way to access underlying strings # both are different - see results tag = soup.find('p') print('\n###### .text attr:') print( tag.text, type(tag.text) ) print('\n###### .strings generator:') for s in tag.strings: # strings is an generator object print s, type(s) # note that .strings generator returns list of bs4.element.NavigableString elements # so we can use them to navigate, for example accessing their parents: print('\n###### NavigableString parents:') for s in tag.strings: print s.parent # or even grandparents :) print('\n###### grandparents:') for s in tag.strings: print s.parent.parent

    0 讨论(0)
提交回复
热议问题