I am using Beautiful Soup to pull out specific div tags, and it seems I can\'t use simple string matching.
The page has some tags in the form of
&l
I think I've got it:
>>> [div['class'] for div in soup.find_all('div')]
[['comment', 'form', 'new'], ['comment', 'comment-xxxx...']]
Notice that, unlike the equivalent in BS3, it's not this:
['comment form new', 'comment comment-xxxx...']
And that's why your regexps won't match.
But you can match, e.g., this:
>>> soup.find_all('div', class_=re.compile('comment-'))
[<div class="comment comment-xxxx..."></div>]
Note that BS does the equivalent of re.search
, not re.match
, so you don't need 'comment-.*'
. Of course if you want to match 'comment-12345'
but not 'comment-of-another-kind
you'd want, e.g., 'comment-\d+'
.