I have composed a regex pattern aiming to capture one date and one number from a sentence. But it does not.
My code is:
txt = \'Την 02/12/2013 καταχωρήθηκ
Issues:
\.+
matches one or more dots, you need to use .+
(no escaping)(?=(κωδικ.\s?αριθμ.\s?καταχ.ριση.)|(κ\.?α\.?κ\.?:?\s*))(?P<KEK_number>\d+)
will always prevent any match since the positive lookahead requires some text that is not 1 or more digits. You need to convert the lookahead to a consuming pattern.I suggest fixing your pattern as
p = re.compile(r'''Την\s? # matches Την with a possible space afterwards
(?P<KEK_date>\d{2}/\d{2}/\d{4}) #matches a date of the given format and captures it with a named group
.+ # Allow for an arbitrary sequence of characters
(?:κωδικ.\s?αριθμ.\s?καταχ.ριση.|κ\.?α\.κ\.:?)\s+ # defines two lookaheads, either of which suffices
(?P<KEK_number>\d+) # captures a sequence of numbers''', re.I | re.X)
See the regex demo
Details
Την\s?
- Την
string and an optional whitespace(?P<KEK_date>\d{2}/\d{2}/\d{4})
- Group "KEK_date": a date pattern, 2 digits, /
, 2 digits, /
and 4 digits.+
- 1 or more chars other than line break chars as many as possible(?:κωδικ.\s?αριθμ.\s?καταχ.ριση.|κ\.?α\.κ\.:?)
- either of
κωδικ.\s?αριθμ.\s?καταχ.ριση.
- κωδικ
, any char, an optional whitespace, αριθμ
, any one char, an optional whitespace, καταχ
, any 1 char, ριση
and any 1 char (but line break char)|
- orκ\.?α\.κ\.:?
- κ
, an optional .
, α
, an optional .
, κ
a .
and then an optional :
\s+
- 1+ whitespaces(?P<KEK_number>\d+)
- Group "KEK_number": 1+ digitsSee a Python demo:
import re
txt = 'Την 02/12/2013 καταχωρήθηκε στο Γενικό Εμπορικό Μητρώο της Υπηρεσίας Γ.Ε.ΜΗ. του Επιμελητηρίου Βοιωτίας, με κωδικόαριθμό καταχώρισης Κ.Α.Κ.: 110035'
p = re.compile(r'''Την\s? # matches Την with a possible space afterwards
(?P<KEK_date>\d{2}/\d{2}/\d{4}) #matches a date of the given format and captures it with a named group
.+ # Allow for an arbitrary sequence of characters
(?:κωδικ.\s?αριθμ.\s?καταχ.ριση.|κ\.?α\.κ\.:?)\s+ # defines two lookaheads, either of which suffices
(?P<KEK_number>\d+) # captures a sequence of numbers''', re.I | re.X)
print(p.findall(txt)) # => [('02/12/2013', '110035')]