I want to do a match for a string when no abc
is followed by some characters (possibly none) and ends with .com
.
I tried with the following
This looks like an XY Problem.
DVK's answer shows you how you can tackle this problem using regular expressions, like you asked for.
My solution (in Python) demonstrates that regular expressions are not necessarily the best approach and that tackling the problem using your programming language's string-handling functionality may produce a more efficient and more maintainable solution.
#!/usr/bin/env python
import unittest
def is_valid_domain(domain):
return domain.endswith('.com') and 'abc' not in domain
class TestIsValidDomain(unittest.TestCase):
def test_edu_invalid(self):
self.assertFalse(is_valid_domain('def.edu'))
def test_abc_invalid(self):
self.assertFalse(is_valid_domain('abc.com'))
self.assertFalse(is_valid_domain('abce.com'))
self.assertFalse(is_valid_domain('abcAnYTHing.com'))
def test_dotcom_valid(self):
self.assertTrue(is_valid_domain('a.com'))
self.assertTrue(is_valid_domain('b.com'))
self.assertTrue(is_valid_domain('ab.com'))
self.assertTrue(is_valid_domain('ae.com'))
if __name__ == '__main__':
unittest.main()
See it run!
Update
Even in a language like Perl, where regular expressions are idiomatic, there's no reason to squash all of your logic into a single regex. A function like this would be far easier to maintain:
sub is_domain_valid {
my $domain = shift;
return $domain =~ /\.com$/ && $domain !~ /abc/;
}
(I'm not a Perl programmer, but this runs and gives the results that you desire)
Condensing:
Sorry if I did not make myself clear. Just give some examples.
I want def.edu, abc.com, abce.com, eabc.com and
abcAnYTHing.com do not match,
while a.com, b.com, ab.com, ae.com etc. match.
New regex after revised OP examples:
/^(?:(?!abc.*\.com\$|^def\.edu\$).)+\.(?:com|edu)\$/s
use strict;
use warnings;
my @samples = qw/
<newline>
shouldn't_pass
def.edu abc.com abce.com eabc.com
<newline>
should_pass.com
a.com b.com ab.com ae.com
abc.edu def.com defa.edu
/;
my $regex = qr
/
^ # Begin string
(?: # Group
(?! # Lookahead ASSERTION
abc.*\.com$ # At any character position, cannot have these in front of us.
| ^def\.edu$ # (or 'def.*\.edu$')
) # End ASSERTION
. # This character passes
)+ # End group, do 1 or more times
\. # End of string check,
(?:com|edu) # must be a '.com' or '.edu' (remove if not needed)
$ # End string
/sx;
print "\nmatch using /^(?:(?!abc.*\.com\$|^def\.edu\$).)+\.(?:com|edu)\$/s \n";
for my $str ( @samples )
{
if ( $str =~ /<newline>/ ) {
print "\n"; next;
}
if ( $str =~ /$regex/ ) {
printf ("passed - $str\n");
}
else {
printf ("failed - $str\n");
}
}
Output:
match using /^(?:(?!abc.*.com$|^def.edu$).)+.(?:com|edu)$/
s
failed - shouldn't_pass
failed - def.edu
failed - abc.com
failed - abce.com
failed - eabc.com
passed - should_pass.com
passed - a.com
passed - b.com
passed - ab.com
passed - ae.com
passed - abc.edu
passed - def.com
passed - defa.edu
It's unclear from your wording if you want to match a string ending with .com
AND NOT containing abc
before that; or to match a string that doesn't have "abc followed by characters followed by .com".
Meaning, in the first case, "def.edu"
does NOT match (no "abc" but doesn't end with ".com") but in the second case "def.edu"
matches (because it's not "abcSOMETHING.com")
In the first case, you need to use negative look-behind:
(?<!abc.+)\.com$
# Use .* instead of .+ if you want "abc.com" to fail as well
IMPORTANT: your original expression using look-behind - #3 ( (?<!abc).*\.com
) - didn't work because look-behind ONLY looks behind immediately preceding the next term. Therefore, the "something after abc" should be included in the look-behind together with abc
- as my RegEx above does.
PROBLEM: my RegEx above likely won't work with your specific RegEx Engine, unless it supports general look-behinds with variable length expression (like the one above) - which ONLY .NET
does these days (A good summary of what does and doesn't support what flavors of look-behind is at http://www.regular-expressions.info/lookaround.html ).
If that is indeed the case, you will have to do double match: first, check for .com
; capturing everything before it; then negative match on abc. I will use Perl syntax since you didn't specify a language:
if (/^(.*)\.com$/) {
if ($1 !~ /abc/) {
# Or, you can just use a substring:
# if (index($1, "abc") < 0) {
# PROFIT!
}
}
In the second case, the EASIEST thing to do is to do a "does not match" operator - e.g. !~
in Perl (or negate a result of a match if your language doesn't support "does not match"). Example using pseudo-code:
if (NOT string.match(/abc.+\.com$/)) ...
Please note that you don't need ".+"/".*" when using negative lookbehind;
Do you just want to exclude strings that start with abc
? That is, would xyzabc.com
be okay? If so, this regex should work:
^(?!abc).+\.com$
If you want to make sure abc
doesn't appear anywhere, use this:
^(?:(?!abc).)+\.com$
In the first regex, the lookahead is applied only once, at the beginning of the string. In the second regex the lookahead is applied each time the .
is about to match a character, ensuring that the character is not the beginning of an abc
sequence.