Matching two words with some characters in between in regular expression

前端未结

关注

 4  462

孤独总比滥情好

I want to do a match for a string when no abc is followed by some characters (possibly none) and ends with .com.

I tried with the following

相关标签:

4条回答

自闭症患者

2021-01-07 15:42

This looks like an XY Problem.

DVK's answer shows you how you can tackle this problem using regular expressions, like you asked for.

My solution (in Python) demonstrates that regular expressions are not necessarily the best approach and that tackling the problem using your programming language's string-handling functionality may produce a more efficient and more maintainable solution.

#!/usr/bin/env python

import unittest

def is_valid_domain(domain):
    return domain.endswith('.com') and 'abc' not in domain

class TestIsValidDomain(unittest.TestCase):

    def test_edu_invalid(self):
        self.assertFalse(is_valid_domain('def.edu'))

    def test_abc_invalid(self):
        self.assertFalse(is_valid_domain('abc.com'))
        self.assertFalse(is_valid_domain('abce.com'))
        self.assertFalse(is_valid_domain('abcAnYTHing.com'))

    def test_dotcom_valid(self):
        self.assertTrue(is_valid_domain('a.com'))
        self.assertTrue(is_valid_domain('b.com'))
        self.assertTrue(is_valid_domain('ab.com'))
        self.assertTrue(is_valid_domain('ae.com'))

if __name__ == '__main__':
    unittest.main()

See it run!

Update

Even in a language like Perl, where regular expressions are idiomatic, there's no reason to squash all of your logic into a single regex. A function like this would be far easier to maintain:

sub is_domain_valid {
    my $domain = shift;
    return $domain =~ /\.com$/ && $domain !~ /abc/;
}

(I'm not a Perl programmer, but this runs and gives the results that you desire)

0 讨论(0)

轻奢々

2021-01-07 15:53

Condensing:

Sorry if I did not make myself clear. Just give some examples.
I want def.edu, abc.com, abce.com, eabc.com and
abcAnYTHing.com do not match,
while a.com, b.com, ab.com, ae.com etc. match.

New regex after revised OP examples:
/^(?:(?!abc.*\.com\$|^def\.edu\$).)+\.(?:com|edu)\$/s

use strict;
use warnings;


my @samples = qw/
 <newline>
   shouldn't_pass 
   def.edu  abc.com  abce.com eabc.com 
 <newline>
   should_pass.com
   a.com    b.com    ab.com   ae.com
   abc.edu  def.com  defa.edu
 /;

my $regex = qr
  /
    ^    # Begin string
      (?:  # Group    

          (?!              # Lookahead ASSERTION
                abc.*\.com$     # At any character position, cannot have these in front of us.
              | ^def\.edu$      # (or 'def.*\.edu$')
           )               # End ASSERTION

           .               # This character passes

      )+   # End group, do 1 or more times

      \.   # End of string check,
      (?:com|edu)   # must be a '.com' or '.edu' (remove if not needed)

    $    # End string
  /sx;


print "\nmatch using   /^(?:(?!abc.*\.com\$|^def\.edu\$).)+\.(?:com|edu)\$/s \n";

for  my $str ( @samples )
{
   if ( $str =~ /<newline>/ ) {
      print "\n"; next;
   }

   if ( $str =~ /$regex/ ) {
       printf ("passed - $str\n");
   }
   else {
       printf ("failed - $str\n");
   }
}

Output:

match using /^(?:(?!abc.*.com$|^def.edu$).)+.(?:com|edu)$/s

failed - shouldn't_pass
failed - def.edu
failed - abc.com
failed - abce.com
failed - eabc.com

passed - should_pass.com
passed - a.com
passed - b.com
passed - ab.com
passed - ae.com
passed - abc.edu
passed - def.com
passed - defa.edu

0 讨论(0)

心在旅途

2021-01-07 15:54
It's unclear from your wording if you want to match a string ending with .com AND NOT containing abc before that; or to match a string that doesn't have "abc followed by characters followed by .com".

Meaning, in the first case, "def.edu" does NOT match (no "abc" but doesn't end with ".com") but in the second case "def.edu" matches (because it's not "abcSOMETHING.com")

In the first case, you need to use negative look-behind:
```
(?<!abc.+)\.com$
# Use .* instead of .+ if you want "abc.com" to fail as well
```
IMPORTANT: your original expression using look-behind - #3 ( (?<!abc).*\.com ) - didn't work because look-behind ONLY looks behind immediately preceding the next term. Therefore, the "something after abc" should be included in the look-behind together with abc - as my RegEx above does.

PROBLEM: my RegEx above likely won't work with your specific RegEx Engine, unless it supports general look-behinds with variable length expression (like the one above) - which ONLY .NET does these days (A good summary of what does and doesn't support what flavors of look-behind is at http://www.regular-expressions.info/lookaround.html ).

If that is indeed the case, you will have to do double match: first, check for .com; capturing everything before it; then negative match on abc. I will use Perl syntax since you didn't specify a language:
```
if (/^(.*)\.com$/) {
    if ($1 !~ /abc/) { 
    # Or, you can just use a substring:
    # if (index($1, "abc") < 0) {
        # PROFIT!
    }
}
```
In the second case, the EASIEST thing to do is to do a "does not match" operator - e.g. !~ in Perl (or negate a result of a match if your language doesn't support "does not match"). Example using pseudo-code:
```
if (NOT string.match(/abc.+\.com$/)) ...
```
Please note that you don't need ".+"/".*" when using negative lookbehind;
0 讨论(0)
发布评论:

提交评论
- 加载中...
暖寄归人

2021-01-07 15:55
Do you just want to exclude strings that start with abc? That is, would xyzabc.com be okay? If so, this regex should work:
```
^(?!abc).+\.com$
```
If you want to make sure abc doesn't appear anywhere, use this:
```
^(?:(?!abc).)+\.com$
```
In the first regex, the lookahead is applied only once, at the beginning of the string. In the second regex the lookahead is applied each time the . is about to match a character, ensuring that the character is not the beginning of an abc sequence.
0 讨论(0)
发布评论:

提交评论
- 加载中...