Python regular expression for Beautiful Soup

后端 未结 1 1994
终归单人心
终归单人心 2020-12-11 18:53

I am using Beautiful Soup to pull out specific div tags, and it seems I can\'t use simple string matching.

The page has some tags in the form of

&l         


        
相关标签:
1条回答
  • 2020-12-11 19:24

    I think I've got it:

    >>> [div['class'] for div in soup.find_all('div')]
    [['comment', 'form', 'new'], ['comment', 'comment-xxxx...']]
    

    Notice that, unlike the equivalent in BS3, it's not this:

    ['comment form new', 'comment comment-xxxx...']
    

    And that's why your regexps won't match.

    But you can match, e.g., this:

    >>> soup.find_all('div', class_=re.compile('comment-'))
    [<div class="comment comment-xxxx..."></div>]
    

    Note that BS does the equivalent of re.search, not re.match, so you don't need 'comment-.*'. Of course if you want to match 'comment-12345' but not 'comment-of-another-kind you'd want, e.g., 'comment-\d+'.

    0 讨论(0)
提交回复
热议问题