问题
I'd like to remove any word which contains a non alpha char from a text file. e.g
"ok 0bad ba1d bad3 4bad4 5bad5bad5"
should become
"ok"
I've tried using
echo "ok 0bad ba1d bad3 4bad4 5bad5bad5" | sed 's/\b[a-zA-Z]*[^a-zA-Z]\+[a-zA-Z]*\b/ /g'
回答1:
Using awk
:
s="ok 0bad ba1d bad3 4bad4 5bad5bad5"
awk '{ofs=""; for (i=1; i<=NF; i++) if ($i ~ /^[[:alpha:]]+$/)
{printf "%s%s", ofs, $i; ofs=OFS} print ""}' <<< "$s"
ok
This awk
command loops through all words and if word matches the regex /^[[:alpha:]]+$/
then it writes to standard out. (i<NF)?OFS:RS
is a short cut to add OFS
if current field no is less than NF
otherwise it writes RS
.
Using grep
+ tr
together:
s="ok 0bad ba1d bad3 4bad4 5bad5bad5"
r=$(grep -o '[^ ]\+' <<< "$s"|grep '^[[:alpha:]]\+$'|tr '\n' ' ')
echo "$r"
ok
First grep -o
breaks the string into individual words. 2nd grep only searches for words with alphabets only. ANd finally tr
translates \n
to space.
回答2:
The following sed command does the job:
sed 's/[[:space:]]*[[:alpha:]]*[^[:space:][:alpha:]][^[:space:]]*//g'
It removes all words containing at least one non-alphabetic character. It is better to use POSIX character classes like [:alpha:]
, because for instance they won't consider the French name "François" as being faulty (i.e. containing a non-alphabetic character).
Explanation
We remove all patterns starting with an arbitrary number of spaces followed by an arbitrary (possibly nil) number of alphabetic characters, followed by at least one non-space and non-alphabetic character, and then glob to the end of the word (i.e. until the next space). Please note that you may want to swap [:space:]
for [:blank:]
, see this page for a detailed explanation of the difference between these two POSIX classes.
Test
$ echo "ok 0bad ba1d bad3 4bad4 5bad5bad5" | sed 's/[[:space:]]*[[:alpha:]]*[^[:space:][:alpha:]][^[:space:]]*//g'
ok
回答3:
If you're not concerned about losing different numbers of spaces between each word, you could use something like this in Perl:
perl -ane 'print join(" ", grep { !/[^[:alpha:]]/ } @F), "\n"
the -a
switch enables auto-split mode, which splits the text on any number of spaces and stores the fields in the array @F
. grep
filters out the elements of that array that contain any non-alphabetical characters. The resulting array is joined on a single space.
回答4:
This might work for you (GNU sed):
sed -r 's/\b([[:alpha:]]+\b ?)|\S+\b ?/\1/g;s/ $//' file
This uses a back reference within alternation to save the required string.
回答5:
st="ok 0bad ba1d bad3 4bad4 5bad5bad5"
for word in $st;
do
if [[ $word =~ ^[a-zA-Z]+$ ]];
then
echo $word;
fi;
done
来源:https://stackoverflow.com/questions/25158710/sed-remove-whole-words-containg-a-character-class