Remove Duplicate Line in Vim?

后端未结

关注

 9  1053

I\'m trying to use VIM to remove a duplicate line in an XML file I created. (I can\'t recreate the file because the ID numbers will change.)

The file looks something

相关标签:

9条回答

面向向阳花

2021-01-15 06:45
Answers using 'uniq' suffer from the problem that 'uniq' only finds adjacent duplicated lines, or the data file is sorted losing positional information.

If no line may ever be repeated, then it is relatively simple to do in Perl (or other scripting language with regex and associative array support), assuming that the data source is not incredibly humungous:
```
#!/bin/perl -w
# BEWARE: untested code!
use strict;
my(%lines);
while (<>)
{
    print if !defined $lines{$_};
    $lines{$_} = 1;
}
```
However, if it is used indiscriminately, this is likely to break the XML since end tags are legitimately repeated. How to avoid this? Maybe by a whitelist of 'OK to repeat' lines? Or maybe only lines with open tags with values are subject to duplicate elimination:
```
#!/bin/perl -w
# BEWARE: untested code!
use strict;
my(%lines);
while (<>)
{
    if (m%^\s*<[^\s>]+\s[^\s>]+%)
    {
         print if !defined $lines{$_};
         $lines{$_} = 1;
    }
    else
    {
         print;
    }
}
```
Of course, there is also the (largely valid) argument that processing XML with regular expressions is misguided. This coding assumes the XML comes with lots of convenient line breaks; real XML may not contain any, or only a very few.
0 讨论(0)
发布评论:

提交评论
- 加载中...
猫巷女王i

2021-01-15 06:47
Are you trying to search and replace the line with nothing? You could try the g command instead:
```
:%g/search_expression_here/d
```
The d at the end tells it to delete the lines that match.

You may find more tips here.
0 讨论(0)
发布评论:

提交评论
- 加载中...
野趣味

2021-01-15 06:50
A simple regular expression won't suffice. I've implemented a
```
:DeleteDuplicateLinesIgnoring
```
command (as well as related commands) in my PatternsOnText plugin. You can even supply a {pattern} to exclude certain lines from the de-duplication.
0 讨论(0)
发布评论:

提交评论
- 加载中...
粉色の甜心

2021-01-15 06:54
instead of using vim you do something like
```
sort filename | uniq -c | grep -v "^[ \t]*1[ \t]"
```
to figure out what is the duplicate line and then just use normal search to visit it and delete it
0 讨论(0)
发布评论:

提交评论
- 加载中...

半阙折子戏

2021-01-15 06:54

It seems like the bash, python and perl methods would work but you are already in vim. So why not create a function like:

function! RemoveDuplicateLines()
    let lines={}
    let result=[]
    for lineno in range(line('$'))
        let line=getline(lineno+1)
        if (!has_key(lines, line))
            let lines[line] = 1
            let result += [ line ]
        endif
    endfor
    %d
    call append(0, result)
    d
endfunction

0 讨论(0)

情话喂你

2021-01-15 07:04
First of all, you can use awk to remove all duplicate lines, keeping their order.
```
:%!awk '\!_[$0]++'
```
If you not sure if there are some other duplicate lines you don't want remove, then just add conditions.
```
:%!awk '\!(_[$0]++ && /tag/ && /natural/ && /water/)'
```
But, parsing a nested structure like xml with regex is a bad idea, IMHO. You are going to care them not to be screwed up all the time. xmllint gives you a list of specific elements in the file:
```
:!echo "cat //tag[@k='natural' and @v='water']" | xmllint --shell %
```
You can slash duplicate lines step by step.
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页