How do I remove duplicate characters and keep the unique one only in Perl?

前端未结

关注

 11  711

How do I remove duplicate characters and keep the unique one only. For example, my input is:

EFUAHUU
UUUEUUUUH
UJUJHHACDEFUCU

Expected out

相关标签:

11条回答

小蘑菇

2020-12-05 16:34
for a file containing the data you list named foo.txt
```
python -c "print set(open('foo.txt').read())"
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

孤城傲影

2020-12-05 16:36

use strict;
use warnings;

my ($uniq, $seq, @result);
$uniq ='';
sub uniq {
    $seq = shift;
    for (split'',$seq) {
    $uniq .=$_ unless $uniq =~ /$_/;
    }
    push @result,$uniq;
    $uniq='';
}

while(<DATA>){
   uniq($_);
}
print @result;

__DATA__
EFUAHUU
UUUEUUUUH
UJUJHHACDEFUCU

The output:

EFUAH
UEH
UJHACDEF

0 讨论(0)

长情又很酷

2020-12-05 16:38
Here is a solution, that I think should work faster than the lookahead one, but is not regexp-based and uses hashtable.
```
perl -n -e '%seen=();' -e 'for (split //) {print unless $seen{$_}++;}' 
```
It splits every line into characters and prints only the first appearance by counting appearances inside %seen hashtable
0 讨论(0)
发布评论:

提交评论
- 加载中...
醉话见心

2020-12-05 16:48
This can be done using positive lookahead :
```
perl -pe 's/(.)(?=.*?\1)//g' FILE_NAME
```
The regex used is: (.)(?=.*?\1)
- . : to match any char.
- first () : remember the matched single char.
- (?=...) : +ve lookahead
- .*? : to match anything in between
- \1 : the remembered match.
- (.)(?=.*?\1) : match and remember any char only if it appears again later in the string.
- s/// : Perl way of doing the substitution.
- g: to do the substitution globally...that is don't stop after first substitution.
- s/(.)(?=.*?\1)//g : this will delete a char from the input string only if that char appears again later in the string.
This will not maintain the order of the char in the input because for every unique char in the input string, we retain its last occurrence and not the first.

To keep the relative order intact we can do what KennyTM tells in one of the comments:
- reverse the input line
- do the substitution as before
- reverse the result before printing
The Perl one line for this is:
```
perl -ne '$_=reverse;s/(.)(?=.*?\1)//g;print scalar reverse;' FILE_NAME
```
Since we are doing print manually after reversal, we don't use the -p flag but use the -n flag.

I'm not sure if this is the best one-liner to do this. I welcome others to edit this answer if they have a better alternative.
0 讨论(0)
发布评论:

提交评论
- 加载中...
心在旅途

2020-12-05 16:49
From the shell, this works:
```
sed -e 's/$/<EOL>/ ; s/./&\n/g' test.txt | uniq | sed -e :a -e '$!N; s/\n//; ta ; s/<EOL>/\n/g'
```
In words: mark every linebreak with a <EOL> string, then put every character on a line of its own, then use uniq to remove duplicate lines, then strip out all the linebreaks, then put back linebreaks instead of the <EOL> markers.

I found the -e :a -e '$!N; s/\n//; ta part in a forum post and I don't understand the seperate -e :a part, or the $!N part, so if anyone can explain those, I'd be grateful.

Hmm, that one does only consecutive duplicates; to eliminate all duplicates you could do this:
```
cat test.txt | while read line ; do echo $line | sed -e 's/./&\n/g' | sort | uniq | sed -e :a -e '$!N; s/\n//; ta' ; done
```
That puts the characters in each line in alphabetical order though.
0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2