This is my first time posting on here so bear with me please.
I received a bash assignment but my professor is completely unhelpful and so are his notes.
Our assig
The multiple grep
s are wasteful. You can simply do
grep -E '^([a-z])[a-z]\1$' /usr/share/dict/words
in one fell swoop, and similarly, put the expressions on grep
's standard input like this:
echo '^([a-z])[a-z]\1$
^([a-z])([a-z])\2\1$
^([a-z])([a-z])[a-z]\2\1$' | grep -E -f - /usr/share/dict/words
However, regular grep
does not permit backreferences beyond \9
. With grep -P
you can use double-digit backreferences, too.
The following script constructs the entire expression in a loop. Unfortunately, grep -P
does not allow for the -f
option, so we build a big thumpin' variable to hold the pattern. Then we can actually also simplify to a single pattern of the form ^(.)(?:.|(.)(?:.|(.)....\3)?\2?\1$
, except we use [a-z]
instead of .
to restrict to just lowercase.
head=''
tail=''
for i in $(seq 1 22); do
head="$head([a-z])(?:[a-z]|"
tail="\\$i${tail:+)?}$tail"
done
grep -P "^${head%|})?$tail$" /usr/share/dict/words
The single grep
should be a lot faster than individually invoking grep
22 or 43 times on the large input file. If you want to sort by length, just add that as a filter at the end of the pipeline; it should still be way faster than multiple passes over the entire dictionary.
The expression ${tail+:)?}
evaluates to a closing parenthesis and question mark only when tail
is non-empty, which is a convenient way to force the \1
back-reference to be non-optional. Somewhat similarly, ${head%|}
trims the final alternation operator from the ultimate value of $head
.