Regular expression matching any subset of a given set?

后端 未结 4 389
面向向阳花
面向向阳花 2021-01-13 14:38

Is it possible to write a regular expression which will match any subset of a given set of characters
a1 ... an ?
I.e. it should match any string whe

相关标签:
4条回答
  • 2021-01-13 14:59

    Can't think how to do it with a single regex, but this is one way to do it with n regexes: (I will usr 1 2 ... m n etc for your as)

    ^[23..n]*1?[23..n]*$
    ^[13..n]*2?[13..n]*$
    ...
    ^[12..m]*n?[12..m]*$
    

    If all the above match, your string is a strict subset of 12..mn.

    How this works: each line requires the string to consist exactly of:

    • any number of charactersm drawn fromthe set, except a particular one
    • perhaps a particular one
    • any number of charactersm drawn fromthe set, except a particular one

    If this passes when every element in turn is considered as a particular one, we know:

    • there is nothing else in the string except the allowed elements
    • there is at most one of each of the allowed elements

    as required.


    for completeness I should say that I would only do this if I was under orders to "use regex"; if not, I'd track which allowed elements have been seen, and iterate over the characters of the string doing the obvious thing.

    0 讨论(0)
  • 2021-01-13 15:02

    This doesn't really qualify for the language-agnostic tag, but...

    ^(?:(?!\1)a1()|(?!\2)a2()|...|(?!\n)an())*$
    

    see a demo on ideone.com

    The first time an element is matched, it gets "checked off" by the capturing group following it. Because the group has now participated in the match, a negative lookahead for its corresponding backreference (e.g., (?!\1)) will never match again, even though the group only captured an empty string. This is an undocumented feature that is nevertheless supported in many flavors, including Java, .NET, Perl, Python, and Ruby.

    This solution also requires support for forward references (i.e., a reference to a given capturing group (\1) appearing in the regex before the group itself). This seems to be a little less widely supported than the empty-groups gimmick.

    0 讨论(0)
  • 2021-01-13 15:06

    Not sure you can get an extended regex to do that, but it's pretty easy to do with a simple traversal of your string.

    You use a hash (or an array, or whatever) to store if any of your allowed characters has already been seen or not in the string. Then you simply iterate over the elements of your string. If you encounter an element not in your allowed set, you bail out. If it's allowed, but you've already seen it, you bail out too.

    In pseudo-code:

    foreach char a in {a1, ..., an}
       hit[a1] = false
    
    foreach char c in string
       if c not in {a1, ..., an} => fail
       if hit[c] => fail
       hit[c] = true
    
    0 讨论(0)
  • 2021-01-13 15:26

    Similar to Alan Moore's, using only \1, and doesn't refer to a capturing group before it has been seen:

    #!/usr/bin/perl
    my $re = qr/^(?:([abc])(?!.*\1))*$/;
    foreach (qw(ba pabc abac a cc cba abcd abbbbc), '') {
        print "'$_' ", ($_ =~ $re) ? "matches" : "does not match", " \$re \n";
    }
    

    We match any number of blocks (the outer (?:)), where each block must consist of "precisely one character from our preferred set, which is not followed by a string containing that character".

    If the string might contain newlines or other funny stuff, it might be necessary to play with some flags to make ^, $ and . behave as intended, but this all depends on the particular RE flavor.

    Just for sillyness, one can use a positive look-ahead assertion to effectively AND two regexps, so we can test for any permutation of abc by asserting that the above matches, followed by an ordinary check for 'is N characters long and consists of these characters':

    my $re2 = qr/^(?=$re)[abc]{3}$/;
    foreach (qw(ba pabc abac a cc abcd abbbbc abc acb bac bca cab cba), '') {
        print "'$_' ", ($_ =~ $re2) ? "matches" : "does not match", " \$re2 \n";
    }
    
    0 讨论(0)
提交回复
热议问题