I have a string which I want to extract a subset of. This is part of a larger Python script.
This is the string:
import re
htmlString = \'
import re
htmlString = '</dd><dt> Fine, thank you. </dt><dd> Molt bé, gràcies. (<i>mohl behh, GRAH-syuhs</i>)'
SearchStr = '(\<\/dd\>\<dt\>)+ ([\w+\,\.\s]+)([\&\#\d\;]+)(\<\/dt\>\<dd\>)+ ([\w\,\s\w\s\w\?\!\.]+) (\(\<i\>)([\w\s\,\-]+)(\<\/i\>\))'
Result = re.search(SearchStr.decode('utf-8'), htmlString.decode('utf-8'), re.I | re.U)
print Result.groups()
Works that way. The expression contains non-latin characters, so it usually fails. You've got to decode into Unicode and use re.U (Unicode) flag.
I'm a beginner too and I faced that issue a couple of times myself.
You are getting AttributeError
because you're calling groups
on None
, which hasn't any methods.
regex.search
returning None
means the regex couldn't find anything matching the pattern from supplied string.
when using regex, it is nice to check whether a match has been made:
Result = re.search(SearchStr, htmlString)
if Result:
print Result.groups()