I want to find duplicated name from the file like below and marked them with \"\".
file:
James Miki:123-456-7890
Wang Tai: 234-563-6879
James Miki: 123-
Try this -
cat f
James Miki:123-456-7890
Wang Tai: 234-563-6879
James Miki: 123-456-7890
James Miki: 456-456-8888 ### added for test case
Wang Tai: 234-563-6879 ### added for test case
Vipin Kumar : 878-432-2345 ### added for test case
Vipin Kumar : 878-432-2345 ### added for test case
awk -F':' '{gsub(/ /,"",$2)}{b[$1FS$2]++} END {for(k in b) if(b[k]>1) {split(k,u,":"); print v,u[1],v FS u[2]}}' v='"' OFS="" f
"Vipin Kumar ":878-432-2345
"Wang Tai":234-563-6879
"James Miki":123-456-7890
Explained -
gsub(/ /,"",$2): Remove space from 2nd column
b[$1FS$2]++ : Create array b and store col1 and 2
if(b[k]>1) : Check duplicate record
split(k,u,":") : split the stored value in k (combination of col1 and 2) so that we can add double quote on first column.
awk
to the rescue!
$ awk -F: 'a[$1]++ {print "\"" $1 "\"" FS $2}' file
"James Miki": 123-456-7890
sed 's/: */:/' FILE | awk -F: '{ if (arr[$1":"$2]) print "\""$1"\":"$2; else { arr[$1":"$2]++; print $0 }}'
Another alternative using sed
+ sort
+ uniq
pipeline:
cat file | sed 's/^\(.*\) *: */"\1": /' | sort | uniq -d
The output:
"James Miki": 123-456-7890