I can do basic regex alright, but this is slightly different, namely I don\'t know what the pattern is going to be.
For example, I have a list of similar strings:
This look much like the LZW algorithm for data (text) compression. There should be python implementations out there, which you may be able to adapt to your need.
I assume you have no a priori knowledge of these sub strings that repeat often.