Find longest repeating substring in JavaScript using regular expressions

后端 未结 2 775
有刺的猬
有刺的猬 2021-01-13 08:59

I\'d like to find the longest repeating string within a string, implemented in JavaScript and using a regular-expression based approach.

I have an PHP implementation

相关标签:
2条回答
  • 2021-01-13 09:29

    Javascript matches only return the first match -- you have to loop in order to find multiple results. A little testing shows this gets the expected results:

    function maxRepeat(input) {
     var reg = /(?=((.+)(?:.*?\2)+))/g;
     var sub = ""; //somewhere to stick temp results
     var maxstr = ""; // our maximum length repeated string
     reg.lastIndex = 0; // because reg previously existed, we may need to reset this
     sub = reg.exec(input); // find the first repeated string
     while (!(sub == null)){
      if ((!(sub == null)) && (sub[2].length > maxstr.length)){
       maxstr = sub[2];
      }
      sub = reg.exec(input);
      reg.lastIndex++; // start searching from the next position
     }
     return maxstr;
    }
    
    // I'm logging to console for convenience
    console.log(maxRepeat("aabcd"));             //aa
    console.log(maxRepeat("inputinput"));        //input
    console.log(maxRepeat("7inputinput"));       //input
    console.log(maxRepeat("inputinput7"));       //input
    console.log(maxRepeat("7inputinput7"));      //input
    console.log(maxRepeat("xxabcdyy"));          //x
    console.log(maxRepeat("XXinputinputYY"));    //input
    

    Note that for "xxabcdyy" you only get "x" back, as it returns the first string of maximum length.

    0 讨论(0)
  • 2021-01-13 09:37

    It seems JS regexes are a bit weird. I don't have a complete answer, but here's what I found.

    Although I thought they did the same thing re.exec() and "string".match(re) behave differently. Exec seems to only return the first match it finds, whereas match seems to return all of them (using /g in both cases).

    On the other hand, exec seems to work correctly with ?= in the regex whereas match returns all empty strings. Removing the ?= leaves us with

    re = /((.+)(?:.*?\2)+)/g
    

    Using that

    "XXinputinputYY".match(re);
    

    returns

    ["XX", "inputinput", "YY"]
    

    whereas

    re.exec("XXinputinputYY");
    

    returns

    ["XX", "XX", "X"]
    

    So at least with match you get inputinput as one of your values. Obviously, this neither pulls out the longest, nor removes the redundancy, but maybe it helps nonetheless.

    One other thing, I tested in firebug's console which threw an error about not supporting $1, so maybe there's something in the $ vars worth looking at.

    0 讨论(0)
提交回复
热议问题