Algorithm for finding all of the shared substrings of any length between 2 strings, and then counting occurrences in string 2?

前端 未结 4 510
慢半拍i
慢半拍i 2020-12-30 04:03

I\'ve run into an unusual challenge and so far I\'m unable to determine the most efficient algorithm to attack this.


Given the following 2 strings as a

4条回答
  •  囚心锁ツ
    2020-12-30 05:00

    From what I can understand, breaking up the string to all possible sub-strings is in itself an O(n*n) operation.

    abcd
    ====
    a,b,c,d
    ab,bc,cd
    abc,bcd
    abcd
    ************************
    abcdefgh
    ========
    a,b,c,d,e,f,g,h
    ab,bc,cd,de,ef,fg,gh
    abc,bcd,cde,def,efg,fgh
    abcd,bcde,cdef,defg,efgh
    abcde,bcdef,cdefg,defgh
    abcdef,bcdefg,cdefgh
    abcdefg,bcdefgh
    abcdefgh
    

    As such, it doesn't look like a solution in linear time is possible.

    Further more to actually solve it, from a Java language perspective, you'd have to first break it up and store it in a set or a map (map can have substring as key and the number of occurrences as count).

    Then repeat the step for the second string as well.

    Then you can iterate over the first, checking if the entry exists in the second string's map and also increment the number of occurrences for that sub-string in parallel.

    If you are using 'C', then you can try sorting the array of sub-strings and then use binary search to find matches (while having a two-dimensional array to keep track of the string and the count of occurrences).

    You said you had a tree approach that ran faster. Do you mind posting a sample so as to how you used a tree ? Was it for representing the sub-strings or to help generate it?

提交回复
热议问题