发表新帖

发表新帖

Python string pattern recognition/compression

前端未结

关注

 6  908

隐瞒了意图╮ 2021-02-15 14:45

I can do basic regex alright, but this is slightly different, namely I don\'t know what the pattern is going to be.

For example, I have a list of similar strings:

<

6条回答

小蘑菇 (楼主)

2021-02-15 15:24

I guess you should start by identifying substrings (patterns) that frequently occur in the strings. Since naively counting substrings in a set of strings is rather computationally expensive, you'll need to come up with something smart.

I've done substring counting on a large amount of data using generalized suffix trees (example here). Once you know the most frequent substrings/patterns in the data, you can take it from there.

0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...

热议问题