I have a text file containing some thousand lines as follows:
File:
abc: bla1 bla1 bla1...
cde: bla bla bla...
ghk: bla1 bla1 bla1...
lmn: bla bla bla...
abc: bla2 bla2 bla2...
bcd: bla bla bla...
ghk: bla2 bla2 bla2...
xyz: bla bla bla...
I want to merge all the lines that start with the same items (as 1 and 5, 3 and 7
) so that I have a new text file like this:
New File:
abc: bla1 bla1 bla1... * abc: bla2 bla2 bla2...
cde: bla bla bla...
ghk: bla1 bla1 bla1... * ghk: bla2 bla2 bla2...
lmn: bla bla bla...
bcd: bla bla bla...
xyz: bla bla bla...
I wonder if this is possible to be solved using regex
and/or grep
, and if yes then how can I solve it?
I'm quite familiar with grep
because I'm on TextWrangler, but also OK with other text editors.
Help much appreciated.
If order doesn't matter, I suggest first sorting the text. That will place
abc: ...
abc: ...
next to one another. Then you'll run this regex through a few passes:
Search:
^(\w+): (.*)\n\1:
Replace:
\1: \2
Result:
abc: bla1 bla1 bla1... bla2 bla2 bla2...
bcd: bla bla bla...
cde: bla bla bla...
ghk: bla1 bla1 bla1... bla2 bla2 bla2...
lmn: bla bla bla...
xyz: bla bla bla...
If order DOES matter, then this regex can be run through a few times:
Search:
^(\w+): (.*)\n((?:(?!\1).*\n)+)\1: (.*\n)
Replace:
\1: \2 \4\3
Result (1st pass):
abc: bla1 bla1 bla1... bla2 bla2 bla2...
cde: bla bla bla...
ghk: bla1 bla1 bla1...
lmn: bla bla bla...
bcd: bla bla bla...
ghk: bla2 bla2 bla2...
xyz: bla bla bla...
Result (2nd pass):
abc: bla1 bla1 bla1... bla2 bla2 bla2...
cde: bla bla bla...
ghk: bla1 bla1 bla1... bla2 bla2 bla2...
lmn: bla bla bla...
bcd: bla bla bla...
xyz: bla bla bla...
With GNU bash. If the order does not matter.
declare -A A # declare associative array A
# fill array
while read I L; do
[ ${#A[$I]} -gt 0 ] && A[$I]+=" * $L"
[ ${#A[$I]} -eq 0 ] && A[$I]+=" $L"
done < filename
# print array
for J in "${!A[@]}"; do echo "$J${A[$J]}"; done
Output:
xyz: bla bla bla...
lmn: bla bla bla...
abc: bla1 bla1 bla1... * bla2 bla2 bla2...
ghk: bla1 bla1 bla1... * bla2 bla2 bla2...
bcd: bla bla bla...
cde: bla bla bla...
If you can use awk
, this should work:
awk '{a[$1]=a[$1]?a[$1]"* "$0:$0} END {for (i in a) print a[i]}' file
ghk: bla1 bla1 bla1... * ghk: bla2 bla2 bla2...
lmn: bla bla bla...
cde: bla bla bla...
xyz: bla bla bla...
bcd: bla bla bla...
abc: bla1 bla1 bla1... * abc: bla2 bla2 bla2..
.
来源:https://stackoverflow.com/questions/25249758/how-to-merge-lines-that-start-with-the-same-items-in-a-text-file