Code Golf: Quickly Build List of Keywords from Text, Including # of Instances

前端未结

关注

 13  646

甜味超标 2020-12-28 20:12

13条回答

小蘑菇 (楼主)

2020-12-28 20:28

#! perl
use strict;
use warnings;

while (<>) {
  for my $word (split) {
    $words{$word}++;
  }
}
for my $word (keys %words) {
  print "$word occurred $words{$word} times.";
}

That's the simple form. If you want sorting, filtering, etc.:

while (<>) {
  for my $word (split) {
    $words{$word}++;
  }
}
for my $word (keys %words) {
  if ((length($word) >= $MINLEN) && ($words{$word) >= $MIN_OCCURRENCE) {
    print "$word occurred $words{$word} times.";
  }
}

You can also sort the output pretty easily:

...
for my $word (keys %words) {
  if ((length($word) >= $MINLEN) && ($words{$word) >= $MIN_OCCURRENCE) {
    push @output, "$word occurred $words{$word} times.";
  }
}
$re = qr/occurred (\d+) /;
print sort {
  $a = $a =~ $re;
  $b = $b =~ $re;
  $a <=> $b
} @output;

A true Perl hacker will easily get these on one or two lines each, but I went for readability.

Edit: this is how I would rewrite this last example

...
for my $word (
  sort { $words{$a} <=> $words{$b} } keys %words
){
  next unless length($word) >= $MINLEN;
  last unless $words{$word) >= $MIN_OCCURRENCE;

  print "$word occurred $words{$word} times.";
}

Or if I needed it to run faster I might even write it like this:

for my $word_data (
  sort {
    $a->[1] <=> $b->[1] # numerical sort on count
  } grep {
    # remove values that are out of bounds
    length($_->[0]) >= $MINLEN &&      # word length
    $_->[1] >= $MIN_OCCURRENCE # count
  } map {
    # [ word, count ]
    [ $_, $words{$_} ]
  } keys %words
){
  my( $word, $count ) = @$word_data;
  print "$word occurred $count times.";
}

It uses map for efficiency, grep to remove extra elements, and sort to do the sorting, of course. ( it does so it in that order )

This is a slight variant of the Schwartzian transform.

0 讨论(0)

查看其它13个回答

热议问题