This is linked to another question/code-golf i asked on Code golf: "Color highlighting" of repeated text
I\'ve got a file \'sample1.txt\' with the following co
Straightforward with Perl:
#! /usr/bin/perl
use warnings;
use strict;
my @words = qw/
LoremIpsum
LoremIpsu
dummytext
oremIpsum
LoremIps
dummytex
industry
oremIpsu
remIpsum
ummytext
LoremIp
dummyte
emIpsum
industr
mmytext
/;
my $to_replace = qr/@{[ join "|" =>
sort { length $b <=> length $a }
@words
]}/;
my $i = 0;
while (<>) {
s|($to_replace)|++$i; "<T$i>$1</T$i>"|eg;
print;
}
Sample run (wrapped to prevent horizontal scrolling):
$ ./tag-words sample.txt <T1>LoremIpsum</T1>issimply<T2>dummytext</T2>oftheprintingandtypesetting<T3>indus try</T3>.<T4>LoremIpsum</T4>hasbeenthe<T5>industry</T5>'sstandard<T6>dummytext</T 6>eversincethe1500s,whenanunknownprintertookagalleyoftypeandscrambledittomakeatyp especimenbook.
You may object that all the qr//
and @{[ ... ]}
business is on the baroque side. One could get the same effect with the /o
regular-expression switch as in
# plain scalar rather than a compiled pattern
my $to_replace = join "|" =>
sort { length $b <=> length $a }
@words;
my $i = 0;
while (<>) {
# o at the end for "compile (o)nce"
s|($to_replace)|++$i; "<T$i>$1</T$i>"|ego;
print;
}
Pure Bash (no externals)
At the Bash command line:
$ sample="LoremIpsumissimplydummytextoftheprintingandtypesettingindustry.LoremIpsumhasbeentheindustry'sstandarddummytexteversincethe1500s,whenanunknownprintertookagalleyoftypeandscrambledittomakeatypespecimenbook."
$ # or: sample=$(<sample1.txt)
$ array=(
LoremIpsum
LoremIpsu
dummytext
...
)
$ tag=0; for entry in ${array[@]}; do test="<[^>/]*>[^>]*$entry[^<]*</"; if [[ ! $sample =~ $test ]]; then ((tag++)); sample=${sample//${entry}/<T$tag>$entry</T$tag>}; fi; done; echo "Output:"; echo $sample
Output:
<T1>LoremIpsum</T1>issimply<T2>dummytext</T2>oftheprintingandtypesetting<T3>industry</T3>.<T1>LoremIpsum</T1>hasbeenthe<T3>industry</T3>'sstandard<T2>dummytext</T2>eversincethe1500s,whenanunknownprintertookagalleyoftypeandscrambledittomakeatypespecimenbook.