Python string pattern recognition/compression

前端 未结 6 908
隐瞒了意图╮
隐瞒了意图╮ 2021-02-15 14:45

I can do basic regex alright, but this is slightly different, namely I don\'t know what the pattern is going to be.

For example, I have a list of similar strings:

<
6条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2021-02-15 15:24

    I guess you should start by identifying substrings (patterns) that frequently occur in the strings. Since naively counting substrings in a set of strings is rather computationally expensive, you'll need to come up with something smart.

    I've done substring counting on a large amount of data using generalized suffix trees (example here). Once you know the most frequent substrings/patterns in the data, you can take it from there.

提交回复
热议问题