#! perl
use strict;
use warnings;
while (<>) {
for my $word (split) {
$words{$word}++;
}
}
for my $word (keys %words) {
print "$word occurred $words{$word} times.";
}
That's the simple form. If you want sorting, filtering, etc.:
while (<>) {
for my $word (split) {
$words{$word}++;
}
}
for my $word (keys %words) {
if ((length($word) >= $MINLEN) && ($words{$word) >= $MIN_OCCURRENCE) {
print "$word occurred $words{$word} times.";
}
}
You can also sort the output pretty easily:
...
for my $word (keys %words) {
if ((length($word) >= $MINLEN) && ($words{$word) >= $MIN_OCCURRENCE) {
push @output, "$word occurred $words{$word} times.";
}
}
$re = qr/occurred (\d+) /;
print sort {
$a = $a =~ $re;
$b = $b =~ $re;
$a <=> $b
} @output;
A true Perl hacker will easily get these on one or two lines each, but I went for readability.
Edit: this is how I would rewrite this last example
...
for my $word (
sort { $words{$a} <=> $words{$b} } keys %words
){
next unless length($word) >= $MINLEN;
last unless $words{$word) >= $MIN_OCCURRENCE;
print "$word occurred $words{$word} times.";
}
Or if I needed it to run faster I might even write it like this:
for my $word_data (
sort {
$a->[1] <=> $b->[1] # numerical sort on count
} grep {
# remove values that are out of bounds
length($_->[0]) >= $MINLEN && # word length
$_->[1] >= $MIN_OCCURRENCE # count
} map {
# [ word, count ]
[ $_, $words{$_} ]
} keys %words
){
my( $word, $count ) = @$word_data;
print "$word occurred $count times.";
}
It uses map for efficiency,
grep to remove extra elements,
and sort to do the sorting, of course. ( it does so it in that order )
This is a slight variant of the Schwartzian transform.