Regular Expression for alphanumeric and underscores

前端 未结 20 761
北荒
北荒 2020-11-22 10:01

I would like to have a regular expression that checks if a string contains only upper and lowercase letters, numbers, and underscores.

相关标签:
20条回答
  • 2020-11-22 10:42

    this works for me you can try

    [\\p{Alnum}_]
    
    0 讨论(0)
  • 2020-11-22 10:45

    Um...question: Does it need to have at least one character or no? Can it be an empty string?

    ^[A-Za-z0-9_]+$
    

    Will do at least one upper or lower case alphanumeric or underscore. If it can be zero length, then just substitute the + for *

    ^[A-Za-z0-9_]*$
    

    Edit:

    If diacritics need to be included (such as cedilla - ç) then you would need to use the word character which does the same as the above, but includes the diacritic characters:

    ^\w+$
    

    Or

    ^\w*$
    
    0 讨论(0)
  • 2020-11-22 10:46

    Try these multi-lingual extensions I have made for string.

    IsAlphaNumeric - String must contain atleast 1 alpha (letter in Unicode range, specified in charSet) and atleast 1 number (specified in numSet). Also, the string should comprise only of alpha and numbers.

    IsAlpha - String should contain atleast 1 alpha (in the language charSet specified) and comprise only of alpha.

    IsNumeric - String should contain atleast 1 number (in the language numSet specified) and comprise only of numbers.

    The charSet/numSet range for the desired language can be specified. The Unicode ranges are available on below link:

    http://www.ssec.wisc.edu/~tomw/java/unicode.html

    API :

        public static bool IsAlphaNumeric(this string stringToTest)
        {
            //English
            const string charSet = "a-zA-Z";
            const string numSet = @"0-9";
    
            //Greek
            //const string charSet = @"\u0388-\u03EF";            
            //const string numSet = @"0-9";
    
            //Bengali
            //const string charSet = @"\u0985-\u09E3";
            //const string numSet = @"\u09E6-\u09EF";
    
            //Hindi
            //const string charSet = @"\u0905-\u0963";
            //const string numSet = @"\u0966-\u096F";
    
            return Regex.Match(stringToTest, @"^(?=[" + numSet + @"]*?[" + charSet + @"]+)(?=[" + charSet + @"]*?[" + numSet + @"]+)[" + charSet + numSet +@"]+$").Success;
        }
    
        public static bool IsNumeric(this string stringToTest)
        {
            //English
            const string numSet = @"0-9";
    
            //Hindi
            //const string numSet = @"\u0966-\u096F";
    
            return Regex.Match(stringToTest, @"^[" + numSet + @"]+$").Success;
        }
    
        public static bool IsAlpha(this string stringToTest)
        {
            //English
            const string charSet = "a-zA-Z";
    
            return Regex.Match(stringToTest, @"^[" + charSet + @"]+$").Success;
        }
    

    Usage :

            //English
            string test = "AASD121asf";
    
            //Greek
            //string test = "Ϡϛβ123";
    
            //Bengali
            //string test = "শর৩৮";
    
            //Hindi
            //string test = @"क़लम३७ख़";
    
            bool isAlphaNum = test.IsAlphaNumeric();
    
    0 讨论(0)
  • 2020-11-22 10:48

    For those of you looking for unicode alphanumeric matching, you might want to do something like:

    ^[\p{L} \p{Nd}_]+$
    

    Further reading at http://unicode.org/reports/tr18/ and at http://www.regular-expressions.info/unicode.html

    0 讨论(0)
  • 2020-11-22 10:51

    To match a string that contains only those characters (or an empty string), try

    "^[a-zA-Z0-9_]*$"
    

    This works for .NET regular expressions, and probably a lot of other languages as well.

    Breaking it down:

    ^ : start of string
    [ : beginning of character group
    a-z : any lowercase letter
    A-Z : any uppercase letter
    0-9 : any digit
    _ : underscore
    ] : end of character group
    * : zero or more of the given characters
    $ : end of string
    

    If you don't want to allow empty strings, use + instead of *.


    As others have pointed out, some regex languages have a shorthand form for [a-zA-Z0-9_]. In the .NET regex language, you can turn on ECMAScript behavior and use \w as a shorthand (yielding ^\w*$ or ^\w+$). Note that in other languages, and by default in .NET, \w is somewhat broader, and will match other sorts of Unicode characters as well (thanks to Jan for pointing this out). So if you're really intending to match only those characters, using the explicit (longer) form is probably best.

    0 讨论(0)
  • 2020-11-22 10:51

    Although it's more verbose than \w, I personally appreciate the readability of the full POSIX character class names ( http://www.zytrax.com/tech/web/regex.htm#special ), so I'd say:

    ^[[:alnum:]_]+$
    

    However, while the documentation at the above links states that \w will "Match any character in the range 0 - 9, A - Z and a - z (equivalent of POSIX [:alnum:])", I have not found this to be true. Not with grep -P anyway. You need to explicitly include the underscore if you use [:alnum:] but not if you use \w. You can't beat the following for short and sweet:

    ^\w+$
    

    Along with readability, using the POSIX character classes (http://www.regular-expressions.info/posixbrackets.html) means that your regex can work on non ASCII strings, which the range based regexes won't do since they rely on the underlying ordering of the ASCII characters which may be different from other character sets and will therefore exclude some non-ASCII characters (letters such as œ) which you might want to capture.

    0 讨论(0)
提交回复
热议问题