Regular expression hangs - Java matcher

后端 未结 2 883
悲&欢浪女
悲&欢浪女 2021-01-23 09:20

String:

Aqua, Sodium Laureth Sulfate, Sodium Lauryl Sulfate, Dimethicone, Cocamide MEA, Zinc Carbonate, Glycol Distearate, Sodium Chloride, Zinc Pyrithion

相关标签:
2条回答
  • 2021-01-23 09:48

    I recommend you split your input string by word and then pattern match it, event simpler: not to pattern match if you just want to test that the first letter of each word is uppercase, like:

    for (String s : string.split("\\W")) {
      if (s.charAt(0) < 'A' || s.charAt(0) > 'Z') {
        return false;
      }
    }
    

    Sounds a lot faster to me (and you can even have the word that failed if you need).

    0 讨论(0)
  • 2021-01-23 09:50

    Perhaps what you had in mind was

    String regex = "([A-Z][\\d\\w]+( [A-Z][-\\d\\w]+)*, )*[A-Z][-\\d\\w]+( [A-Z][-\\d\\w]+)*\\.";
    System.out.println(string.matches(regex));
    

    returns true.

    The problem you have with the regex is that its overly complicated. The disadvantage with adding expressions until you get true is that it can match things you didn't have in mind.

    Random rand = new Random();
    while(true) {
        byte[] bytes = new byte[40];
        rand.nextBytes(bytes);
        for (int i = 0; i < bytes.length; i++) bytes[i] &= 0x7F;
        String string = new String(bytes, 0);
        if (string.matches("([\\W]*\\b[A-Z\\d]\\w+\\b[\\W\\d]*)+"))
            System.out.println(string);
    }
    

    prints things such as

    "^;%XX`'SwJ|[*4"*0C<Tgbom_. \^
    {PvU_y9aJSm?08EL(   NpfA9a[:$YbN8VTtMk
    ;![`LR7Yy\AO5PZ@X4}GajC<*XvKE11
    8l5W6*IDNH[9C'@.>7`LHsCN*,{26O}
    EFJ5MBVxi%W_t6v54EmLmgjFvqyYh\<"
    +7]|ULh2[MT`Yx{MKH4N
    '8p!2mf
    

    whereas the expression I gave matches

    KfhBuGv7, S3.
    IWzu, XHop4Z.
    LJbXfrd, PdR.
    V2dxQV, LA9z.
    HKf37cy0, TS.
    RAw2E5a, Ajs.
    Up-, GPQ7 I_.
    
    0 讨论(0)
提交回复
热议问题