问题
I use buildroot package to port some software packages to some Linux embedded system. Some software packages also produce plain text script and/or library control files with references to staging directories. It is necessary to remove the references to staging directories at the stage of packaging the software for distribution. I have no problem to use SED to remove such references. However, this processing leaves some undesired patterns of duplicate strings and I excerpted as shown below. I would like to know if it is possible to use SED to remove such duplicates.
Note1: The 'dependency_libs=' was left out and is now amended as shown below. I tried to be succinct to excerpt what is needed here and did not include the 'dependency_libs=' here before because it doesn't contain any duplicates. Apparently, it plays an important part on some of suggested solutions below. Therefore, I amended it here for posterity.
Note2: I just found out a little bug with the sed scripts from @potong. If the duplicate strings are the last object sans an empty space, the sed scripts fails. In this case, the 1st 'dependency_libs=' line will partially fail the sed scripts. The 2nd 'dependency_libs=' line has included a space at the end of the line (right before the single quote) and passes through the sed scripts without a problem. I have amended it here to show the difference.
cppflags=-I/usr/include -I/include -I/usr/include -I/include -I${includedir}/mine
cxxflags=-I/usr/include -I/include -I/usr/include -I/include -I${includedir}/mine
Cflags: -I/usr/include -I/include -I/usr/include -I/include -I${includedir}/mine
Libs: -L/usr/lib -L/lib -L/usr/lib -L/lib -L${libdir} -lmine${suffix}
dependency_libs='-L/usr/lib -L/lib -L/usr/lib -L/lib -L/usr/lib/libiconv-full/lib -L/usr/lib/libintl-full/lib -L/usr/lib -L/lib -L/usr/lib -L/lib'
dependency_libs='-L/usr/lib -L/lib -L/usr/lib -L/lib -L/usr/lib/libiconv-full/lib -L/usr/lib/libintl-full/lib -L/usr/lib -L/lib -L/usr/lib -L/lib '
so that it will become:
cppflags=-I/usr/include -I/include -I${includedir}/mine
cxxflags=-I/usr/include -I/include -I${includedir}/mine
Cflags: -I/usr/include -I/include -I${includedir}/mine
Libs: -L/usr/lib -L/lib -L${libdir} -lmine${suffix}
dependency_libs='-L/usr/lib/libiconv-full/lib -L/usr/lib/libintl-full/lib'
dependency_libs='-L/usr/lib/libiconv-full/lib -L/usr/lib/libintl-full/lib'
回答1:
This might work for you (GNU sed):
sed -r ':a;s|((-[IL]/\S+\s).*)\2|\1|;ta' file
This looks for strings begining with -I/
or -L/
followed by one or more non-spaces and a space that are repeated and removes the second occurance. If the substitution takes place the process is repeated until no more substitutions occur.
回答2:
This may work for you:
awk -F- '
{
for(i = 2; i <= NF; ++i) a[$i] = 1;
printf("%s", $1)
for(x in a) printf("-%s ", x)
print""
delete a
}
'
Output:
cppflags=-I${includedir}/mine -I/include -I/usr/include
cxxflags=-I${includedir}/mine -I/include -I/usr/include
Cflags: -I${includedir}/mine -I/include -I/usr/include
Libs: -L${libdir} -lmine${suffix} -L/lib -L/usr/lib
Note that it doesn't retain the order of the directories, and it adds an extra space here and there.
If you need to retain the order of the directories and you can use gawk
, try:
gawk -F- '
BEGIN {PROCINFO["sorted_in"] = "@val_num_asc"}
{
for(i = 2; i <= NF; ++i)
if (!($i in a))
a[$i] = i;
printf("%s", $1)
for(x in a) printf("-%s ", x)
print""
delete a
}
'
Output:
cppflags=-I/usr/include -I/include -I${includedir}/mine
cxxflags=-I/usr/include -I/include -I${includedir}/mine
Cflags: -I/usr/include -I/include -I${includedir}/mine
Libs: -L/usr/lib -L/lib -L${libdir} -lmine${suffix}
Or you can get the same output using a non-gnu awk like this:
awk -F- '
{
for(i = 2; i <= NF; ++i)
if (!($i in a))
a[$i] = i;
printf("%s", $1)
for(x in a) b[a[x]] = x
for(x in b) printf("-%s ", b[x])
print""
delete a
delete b
}
'
And, of course, if you need to get rid of the extra spaces, you can pipe the output through tr -s ' '
.
回答3:
I don't think sed
will work, because you need a field-oriented utility that can process interrelated parts of a single line.
Use of awk
, as in @ooga's answer, is an option, but here's a pure bash
solution.
Note:
- Only suitable for small input files for performance reasons.
- Assumes that no options in the input have embedded whitespace.
- Input order of options is preserved (whitespace between options is normalized).
#!/usr/bin/env bash
while read -r line; do
# Split line into prefix, separator, options array.
[[ $line =~ ^([^=:]+)([:=]\ *)(.*)$ ]]
prefix=${BASH_REMATCH[1]}
sep=${BASH_REMATCH[2]}
read -ra optArray <<<"${BASH_REMATCH[3]}"
# Loop over options array and build up a list without duplicates.
dedupOptList=''
for opt in "${optArray[@]}"; do
[[ " $dedupOptList " == *" $opt "* ]] || dedupOptList+=" $opt"
done
# Finally, rebuild the line with the deduplicated options list and print.
printf '%s%s%s\n' "$prefix" "$sep" "${dedupOptList:1}"
done < file
来源:https://stackoverflow.com/questions/24612037/removing-duplicate-strings-with-sed