I have a dataframe with lyrics of songs in one column (one song per row) and I know that there are several duplicates in there which are not exactly the same but have some d