I would like to have a regex pattern to match smileys \":)\" ,\":(\" .Also it should capture repeated smileys like \":) :)\" , \":) :(\" but filter out invalid syntax like \
Try (?::|;|=)(?:-)?(?:\)|\(|D|P)
. Haven't tested it extensively, but does seem to match the right ones and not more...
In [15]: import re
In [16]: s = "Just: to :)) =) test :(:-(( ():: :):) :(:( :P ;)!"
In [17]: re.findall(r'(?::|;|=)(?:-)?(?:\)|\(|D|P)',s)
Out[17]: [':)', '=)', ':(', ':-(', ':)', ':)', ':(', ':(', ':P', ';)']
Maybe something like:
re.match('[:;][)(](?![)(])', str)
I got the answer I was looking for from the comments and answers posted here.
re.match("^(:[)(])*$",str)
Thanks to all.
I think it finally "clicked" exactly what you're asking about here. Take a look at the below:
import re
smiley_pattern = '^(:\(|:\))+$' # matches only the smileys ":)" and ":("
def test_match(s):
print 'Value: %s; Result: %s' % (
s,
'Matches!' if re.match(smiley_pattern, s) else 'Doesn\'t match.'
)
should_match = [
':)', # Single smile
':(', # Single frown
':):)', # Two smiles
':(:(', # Two frowns
':):(', # Mix of a smile and a frown
]
should_not_match = [
'', # Empty string
':(foo', # Extraneous characters appended
'foo:(', # Extraneous characters prepended
':( :(', # Space between frowns
':( (', # Extraneous characters and space appended
':((' # Extraneous duplicate of final character appended
]
print('The following should all match:')
for x in should_match: test_match(x);
print('') # Newline for output clarity
print('The following should all not match:')
for x in should_not_match: test_match(x);
The problem with your original code is that your regex is wrong: (:\()
. Let's break it down.
The outside parentheses are a "grouping". They're what you'd reference if you were going to do a string replacement, and are used to apply regex operators on groups of characters at once. So, you're really saying:
(
begin a group
:\(
... do regex stuff ...The :
isn't a regex reserved character, so it's just a colon. The \
is, and it means "the following character is literal, not a regex operator". This is called an "escape sequence". Fully parsed into English, your regex says
(
begin a group
:
a colon character\(
a left parenthesis character)
end the groupThe regex I used is slightly more complex, but not bad. Let's break it down: ^(:\(|:\))+$
.
^
and $
mean "the beginning of the line" and "the end of the line" respectively. Now we have ...
^
beginning of line
(:\(|:\))+
... do regex stuff ...$
end of line... so it only matches things that comprise the entire line, not simply occur in the middle of the string.
We know that (
and )
denote a grouping. +
means "one of more of these". Now we have:
^
beginning of line(
start a group
:\(|:\)
... do regex stuff ...)
end the group+
match one or more of this$
end of lineFinally, there's the |
(pipe) operator. It means "or". So, applying what we know from above about escaping characters, we're ready to complete the translation:
^
beginning of line(
start a group
:
a colon character\(
a left parenthesis character|
or
:
a colon character\)
a right parenthesis character)
end the group+
match one or more of this$
end of lineI hope this helps. If not, let me know and I'll be happy to edit my answer with a reply.