How to find spans with a specific class containing specific text using beautiful soup and re?

前端 未结 3 1685
無奈伤痛
無奈伤痛 2021-02-01 07:16

how can I find all span\'s with a class of \'blue\' that contain text in the format:

04/18/13 7:29pm

which could therefore be:

3条回答
  •  夕颜
    夕颜 (楼主)
    2021-02-01 07:30

    This is a flexible regex that you can use:

    "(\d\d?/\d\d?/\d\d\d?\d?\s*\d\d?:\d\d[a|p|A|P][m|M])"
    

    Example:

    >>> import re
    >>> from bs4 import BeautifulSoup
    >>> html = """
    
    
    here is a lot of text that i don't need
    this is the span i need because it contains 04/18/13 7:29pm
    04/19/13 7:30pm
    04/18/13 7:29pm
    Posted on 15/18/2013 10:00AM
    Posted on 04/20/13 10:31pm
    Posted on 4/1/2013 17:09aM
    
    
    """
    >>> soup = BeautifulSoup(html)
    >>> lines = [i.get_text() for i in soup.find_all('span', {'class' : 'blue'})]
    >>> ok = [m.group(1)
          for line in lines
            for m in (re.search(r'(\d\d?/\d\d?/\d\d\d?\d?\s*\d\d?:\d\d[a|p|A|P][m|M])', line),)
              if m]
    >>> ok
    [u'04/18/13 7:29pm', u'04/19/13 7:30pm', u'04/18/13 7:29pm', u'15/18/2013 10:00AM', u'04/20/13 10:31pm', u'4/1/2013 17:09aM']
    >>> for i in ok:
        print i
    
    04/18/13 7:29pm
    04/19/13 7:30pm
    04/18/13 7:29pm
    15/18/2013 10:00AM
    04/20/13 10:31pm
    4/1/2013 17:09aM
    

提交回复
热议问题