I have a text file that contains a long list of entries (one on each line). Some of these are duplicates, and I would like to know if it is possible (and if so, how) to remove
This worked for me for both .csv
and .txt
awk '!seen[$0]++' <filename> > <newFileName>
Explanation: The first part of the command prints unique rows and the second part i.e. after the middle arrow is to save the output of the first part.
awk '!seen[$0]++' <filename>
>
<newFileName>
From command line just do:
sort file | uniq > file.new
g/^\(.*\)$\n\1/d
Works for me on Windows. Lines must be sorted first though.
This version only removes repeated lines that are contigous. I mean, only deletes consecutive repeated lines. Using the given map the function does note mess up with blank lines. But if change the REGEX to match start of line ^
it will also remove duplicated blank lines.
" function to delete duplicate lines
function! DelDuplicatedLines()
while getline(".") == getline(line(".") - 1)
exec 'norm! ddk'
endwhile
while getline(".") == getline(line(".") + 1)
exec 'norm! dd'
endwhile
endfunction
nnoremap <Leader>d :g/./call DelDuplicatedLines()<CR>
awk '!x[$0]++' yourfile.txt
if you want to preserve the order (i.e., sorting is not acceptable). In order to invoke it from vim, :!
can be used.
I would use !}uniq
, but that only works if there are no blank lines.
For every line in a file use: :1,$!uniq
.