Remove consecutive duplicate words from a file using awk or sed

前端 未结 6 1369
不思量自难忘°
不思量自难忘° 2021-01-16 16:53

My input file looks like below:

“true true, rohith Rohith;
cold burn, and fact and fact good good?”

Output shoud look like:



        
6条回答
  •  鱼传尺愫
    2021-01-16 17:46

    Depending on your expected input, this might work:

    sed -r 's/([a-zA-Z0-9_-]+)( *)\1/\1\2/g ; s/ ([.,;:])/\1/g ; s/  / /g' myfile
    

    ([a-zA-Z0-9_-]+) = words that might be repeated.

    ( *)\1 = check if the previous word is repeated after a space.

    s/ ([.,;:])/\1/g = removes extra spaces before punctuation (you might want to add characters to this group).

    s/ / /g = removes double spaces.

    This works with GNU sed.

提交回复
热议问题