Why can't I find this string in RegEx?

橙三吉。 提交于 2021-02-05 06:39:46

问题


lines = []
total_check = 0

with pdfplumber.open(file) as pdf:
    pages = pdf.pages
    for page in pdf.pages:
        text = page.extract_text()
        for line in text.split('\n'):
            print(line)

output data:

Totaalbedrag excl. btw € 25,00

When I try to retrieve VAT from data:

KVK_re = re.compile(r'(excl. btw .+)')
KVK_re.search(data).group(0)

output: AttributeError: 'NoneType' object has no attribute 'group'

KVK_re = re.compile(r'(excl. btw .+)')
KVK_re.search(r'excl. btw € 25,00').group(0)

output: 'excl. btw € 25,00'

How is it possible that when I paste the literal output in a search it does find the number € 25,00 and when I enter the data variable it does not?

Please help me!


回答1:


In most cases, when a literal space is used in the pattern and there is no match, the reason is the invisible characters, or non-breaking spaces.

When you have non-breaking spaces, \xA0, you can simply replace the literal spaces with \s to match any whitespace, or [ \xA0] to match either of the spaces.

It appears there may be a combination of both spaces and some invisible chars in this case, thus, you may use \W to match any non-word chars instead of a literal space:

r'excl\.\W+btw\W.+'



回答2:


You didn't provide what the contents of the data object are, but the error message is just saying that the regex is not found. So you're probably calling search on data that doesn't contain that specific string.

$ KVK_re = re.compile(r'(excl. btw .+)')
$ KVK_re.search('test').group(0)
AttributeError: 'NoneType' object has no attribute 'group'


来源:https://stackoverflow.com/questions/64607452/why-cant-i-find-this-string-in-regex

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!