How do I count the characters, words, and lines in a file, using Perl?

前端 未结 10 1339
醉酒成梦
醉酒成梦 2020-12-31 03:21

What is a good/best way to count the number of characters, words, and lines of a text file using Perl (without using wc)?

相关标签:
10条回答
  • Reading the file in fixed-size chunks may be more efficient than reading line-by-line. The wc binary does this.

    #!/usr/bin/env perl
    
    use constant BLOCK_SIZE => 16384;
    
    for my $file (@ARGV) {
        open my $fh, '<', $file or do {
            warn "couldn't open $file: $!\n";
            continue;
        };
    
        my ($chars, $words, $lines) = (0, 0, 0);
    
        my ($new_word, $new_line);
        while ((my $size = sysread $fh, local $_, BLOCK_SIZE) > 0) {
            $chars += $size;
            $words += /\s+/g;
            $words-- if $new_word && /\A\s/;
            $lines += () = /\n/g;
    
            $new_word = /\s\Z/;
            $new_line = /\n\Z/;
        }
        $lines-- if $new_line;
    
        print "\t$lines\t$words\t$chars\t$file\n";
    }
    
    0 讨论(0)
  • 2020-12-31 03:41

    Here's the perl code. Counting words can be somewhat subjective, but I just say it's any string of characters that isn't whitespace.

    open(FILE, "<file.txt") or die "Could not open file: $!";
    
    my ($lines, $words, $chars) = (0,0,0);
    
    while (<FILE>) {
        $lines++;
        $chars += length($_);
        $words += scalar(split(/\s+/, $_));
    }
    
    print("lines=$lines words=$words chars=$chars\n");
    
    0 讨论(0)
  • 2020-12-31 03:48

    A variation on bmdhacks' answer that will probably produce better results is to use \s+ (or even better \W+) as the delimiter. Consider the string "The  quick  brown fox" (additional spaces if it's not obvious). Using a delimiter of a single whitespace character will give a word count of six not four. So, try:

    open(FILE, "<file.txt") or die "Could not open file: $!";
    
    my ($lines, $words, $chars) = (0,0,0);
    
    while (<FILE>) {
        $lines++;
        $chars += length($_);
        $words += scalar(split(/\W+/, $_));
    }
    
    print("lines=$lines words=$words chars=$chars\n");
    

    Using \W+ as the delimiter will stop punctuation (amongst other things) from counting as words.

    0 讨论(0)
  • 2020-12-31 03:48

    This may be helpful to Perl beginners. I tried to simulate MS word counting functionalities and added one more feature which is not shown using wc in Linux.

    • number of lines
    • number of words
    • number of characters with space
    • number of characters without space (wc will not give this in its output but Microsoft words shows it.)

    Here is the url: Counting words,characters and lines in a file

    0 讨论(0)
提交回复
热议问题