Most common substring of length X

前端 未结 8 2189
没有蜡笔的小新
没有蜡笔的小新 2021-02-09 13:49

I have a string s and I want to search for the substring of length X that occurs most often in s. Overlapping substrings are allowed.

For example, if s=\"aoaoa\" and X=3

8条回答
  •  既然无缘
    2021-02-09 14:15

    You can build a tree of sub-strings. The idea is to organise your sub-strings like a telephone book. You then look up the sub-string and increase its count by one.

    In your example above, the tree will have sections (nodes) starting with the letters: 'a' and 'o'. 'a' appears three times and 'o' appears twice. So those nodes will have a count of 3 and 2 respectively.

    Next, under the 'a' node a sub-node of 'o' will appear corresponding to the sub-string 'ao'. This appears twice. Under the 'o' node 'a' also appears twice.

    We carry on in this fashion until we reach the end of the string.

    A representation of the tree for 'abac' might be (nodes on the same level are separated by a comma, sub-nodes are in brackets, counts appear after the colon).

    a:2(b:1(a:1(c:1())),c:1()),b:1(a:1(c:1())),c:1()

    If the tree is drawn out it will be a lot more obvious! What this all says for example is that the string 'aba' appears once, or the string 'a' appears twice etc. But, storage is greatly reduced and more importantly retrieval is greatly speeded up (compare this to keeping a list of sub-strings).

    To find out which sub-string is most repeated, do a depth first search of the tree, every time a leaf node is reached, note the count, and keep a track of the highest one.

    The running time is probably something like O(log(n)) not sure, but certainly better than O(n^2).

提交回复
热议问题