问题
I have a string
cabbagee
I want to delete duplicate charaters. If I use tr -s it will remove duplicate characters in the sequence. But my desired output is
cabge
Appreciate if anyone can help me with that.
The answer provided was right but I was not able to use awk so I used:
#!/usr/bin/bash
key=$1
len=${#key}
mkey=""
for (( c=0; c<len; c++ ))
do
tmp=${key:$c:1}
echo $mkey | grep $tmp >/dev/null 2>&1
if [ "$?" -eq "0" ]; then
echo "Found $tmp in $mkey"
else
mkey+=$tmp
fi
done
echo $mkey
回答1:
Can you use awk
?
awk -v FS="" '{
for(i=1;i<=NF;i++)str=(++a[$i]==1?str $i:str)
}
END {print str}' <<< "cabbagee"
cabge
Couple of other ways:
gnu awk
:
awk -v RS='[a-z]' '{str=(++a[RT]==1?str RT: str)}END{print str}' <<< "cabbagee"
cabge
awk -v RS='[a-z]' -v ORS= '++a[RT]==1{print RT}END{print "\n"}' <<< "cabbagee"
cabge
gnu sed
and awk
:
sed 's/./&\n/g' <<< "cabbagee" | awk '!a[$1]++' | sed ':a;N;s/\n//;ba'
cabge
回答2:
You edited your post and posted an answer that's ugly and broken. A simpler, working and more efficient one, in pure Bash:
#!/bin/bash
key=$1
mkey=$key
for ((i=0;i<${#mkey};++i)); do
c=${mkey:i:1}
tailmkey=${mkey:i+1}
mkey=${mkey::i+1}${tailmkey//"$c"/}
done
echo "$mkey"
Why is your script broken? Here are a few cases where yours fail and mine doesn't. For the sake of the demonstration, I called your script banana
and mine gorilla
. Oh, because I'm not mean, I fixed the trivial quoting problems your script has (that trivially breaks with the *
character) and commented the flooding part:
#!/usr/bin/bash
key=$1
len=${#key}
mkey=""
for (( c=0; c<len; c++ )); do
tmp=${key:$c:1}
echo "$mkey" | grep "$tmp" >/dev/null 2>&1 # Added quotes here!
if [ "$?" -eq "0" ]; then
: # echo "Found $tmp in $mkey" # Commented this to remove flooding
else
mkey+=$tmp
fi
done
echo "$mkey" # Added quotes here!
So let's go:
$ ./banana '^'
$ ./gorilla '^'
'^'
Yes, that's because ^
is a character used in grep's regex. Similar stuff happens with $
, and also with .
:
$ ./banana 'a.'
a
$ ./gorilla 'a.'
a.
Now the backslash causes problems too:
$ ./banana '\\'
\\
$ ./gorilla '\\'
\
(remove the >/dev/null 2>&1
part to see the grep: Trailing backslash
error). The same thing happens with [
.
Not mentioning that your script is highly inefficient! it calls grep
multiple times. Mine is a bit better in that respect:
$ time for i in {1..200}; do ./banana cabbage; done &>/dev/null
real 0m3.028s
user 0m0.216s
sys 0m0.464s
$ time for i in {1..200}; do ./gorilla cabbage; done &>/dev/null
real 0m0.878s
user 0m0.172s
sys 0m0.324s
Not bad, eh?
Another benchmark that speaks for itself: with a long string, e.g., a paragraph of Lorem Ipsum:
$ time ./banana 'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec a diam lectus. Sed sit amet ipsum mauris. Maecenas congue ligula ac quam viverra nec consectetur ante hendrerit. Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aenean ut gravida lorem. Ut turpis felis, pulvinar a semper sed, adipiscing id dolor. Pellentesque auctor nisi id magna consequat sagittis. Curabitur dapibus enim sit amet elit pharetra tincidunt feugiat nisl imperdiet. Ut convallis libero in urna ultrices accumsan. Donec sed odio eros. Donec viverra mi quis quam pulvinar at malesuada arcu rhoncus. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. In rutrum accumsan ultricies. Mauris vitae nisi at sem facilisis semper ac in est.'
Lorem ipsudlta,cngDSMqvhPbNAUfCI
real 0m1.464s
user 0m0.104s
sys 0m0.224s
$ time ./gorilla 'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec a diam lectus. Sed sit amet ipsum mauris. Maecenas congue ligula ac quam viverra nec consectetur ante hendrerit. Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aenean ut gravida lorem. Ut turpis felis, pulvinar a semper sed, adipiscing id dolor. Pellentesque auctor nisi id magna consequat sagittis. Curabitur dapibus enim sit amet elit pharetra tincidunt feugiat nisl imperdiet. Ut convallis libero in urna ultrices accumsan. Donec sed odio eros. Donec viverra mi quis quam pulvinar at malesuada arcu rhoncus. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. In rutrum accumsan ultricies. Mauris vitae nisi at sem facilisis semper ac in est.'
Lorem ipsudlta,cng.DSMqvhPbNAUfCI
real 0m0.013s
user 0m0.000s
sys 0m0.008s
That's because banana
is calling a grep
for each character of the input string, whereas gorilla
performs removal dynamically. (I'm not going to mention that banana
missed the period).
回答3:
How about:
echo "cabbagee" | sed 's/./&\n/g' | perl -ne '$H{$_}++ or print' | tr -d '\n'
Which yields:
cabge
The above splits your string's characters into individual lines (sed 's/./&\n/g'
) and then uses a bit of perl
magic (credit unix tool to remove duplicate lines from a file) to remove any duplicate lines. Finally, the tr -d '\n'
removes the newlines we added to achieve your desired output.
Might need to modify it a bit for your specific purpose, and it feels terribly hacky, but it seems to get the job done.
Good luck.
回答4:
You could use grep -o .
to split each character with \n
then collect only the characters that haven't been seen in bash:
grep -o . <<<'cabbagee' | \
{ while read c; do [[ "$s" = *$c* ]] || s=$s$c; done; echo $s; }
回答5:
I'm not sure what language you are doing this in, but you could always make a for loop to go through the string. Then make an if loop stating if yourstring.charAt(i).equals(yourstring.char(i+1){ replace(yourstring.char(i+1),"")} So basically going through a loop stating if the character at the current index is equal to the character at the next index then replace the next index with an empty string: "".
来源:https://stackoverflow.com/questions/23402740/how-to-remove-duplicated-characters-from-string-in-bash