line-processing | 易学教程

Perl - Find duplicate lines in file or array

阅读更多关于 Perl - Find duplicate lines in file or array

问题 I'm trying to print duplicate lines from the filehandle, not remove them or anything else I see asked on other questions. I don't have enough experience with perl to be able to quickly do this, so I'm asking here. What's the way to do this? 回答1: Using the standard Perl shorthands: my %seen; while ( <> ) { print if $seen{$_}++; } As a "one-liner": perl -ne 'print if $seen{$_}++' More data? This prints <file name>:<line number>:<line> : perl -ne 'print ( $ARGV eq "-" ? "" : "$ARGV:" ), "$.:$_"

Get random lines from large files in bash

阅读更多关于 Get random lines from large files in bash

问题 How can I get n random lines from very large files that can't fit in memory. Also it would be great if I could add filters before or after the randomization. update 1 in my case the specs are : > 100 million lines > 10GB files usual random batch size 10000-30000 512RAM hosted ubuntu server 14.10 so losing a few lines from the file won't be such a big problem as they have a 1 in 10000 chance anyway, but performance and resource consumption would be a problem 回答1: Here's a wee bash function for

Get random lines from large files in bash

阅读更多关于 Get random lines from large files in bash

How can I get n random lines from very large files that can't fit in memory. Also it would be great if I could add filters before or after the randomization. update 1 in my case the specs are : > 100 million lines > 10GB files usual random batch size 10000-30000 512RAM hosted ubuntu server 14.10 so losing a few lines from the file won't be such a big problem as they have a 1 in 10000 chance anyway, but performance and resource consumption would be a problem Here's a wee bash function for you. It grabs, as you say, a "batch" of lines, with a random start point within a file. randline() { local

Split line with perl

阅读更多关于 Split line with perl

问题 title: Football team: Real Madrid stadium: Santiago Bernabeu players: Zinédine Zidane, Ronaldo, Luís Figo, Roberto Carlos, Raúl personnel: José Mourinho (head coach) Aitor Karanka (assistant coach (es)) How to split this with perl in: title: Football team: Real Madrid stadium: Santiago Bernabeu players: Zinédine Zidane Ronaldo Luís Figo Roberto Carlos Raúl personnel: José Mourinho (head coach) Aitor Karanka (assistant coach (es)) 回答1: Use a lookahead assertion: say for split /(?=\w+:)/, $real

Split line with perl

阅读更多关于 Split line with perl

title: Football team: Real Madrid stadium: Santiago Bernabeu players: Zinédine Zidane, Ronaldo, Luís Figo, Roberto Carlos, Raúl personnel: José Mourinho (head coach) Aitor Karanka (assistant coach (es)) How to split this with perl in: title: Football team: Real Madrid stadium: Santiago Bernabeu players: Zinédine Zidane Ronaldo Luís Figo Roberto Carlos Raúl personnel: José Mourinho (head coach) Aitor Karanka (assistant coach (es)) Use a lookahead assertion: say for split /(?=\w+:)/, $real_madrid_string; Output title: Football team: Real Madrid stadium: Santiago Bernabeu players: Zinédine Zidane

Extracting specific lines with Perl

阅读更多关于 Extracting specific lines with Perl

问题 I am writing a perl program to extract lines that are in between the two patterns i am matching. for example the below text file has 6 lines. I am matching load balancer and end. I want to get the 4 lines that are in between. **load balancer** new old good bad **end** My question is how do you extract lines in between load balancer and end into an array. Any help is greatly appreciated. 回答1: You can use the flip-flop operator. Additionally, you can also use the return value of the flipflop to

Extracting specific lines with Perl

阅读更多关于 Extracting specific lines with Perl

I am writing a perl program to extract lines that are in between the two patterns i am matching. for example the below text file has 6 lines. I am matching load balancer and end. I want to get the 4 lines that are in between. **load balancer** new old good bad **end** My question is how do you extract lines in between load balancer and end into an array. Any help is greatly appreciated. Schwern You can use the flip-flop operator . Additionally, you can also use the return value of the flipflop to filter out the boundary lines. The return value is a sequence number (starting with 1) and the

Perl - Find duplicate lines in file or array

阅读更多关于 Perl - Find duplicate lines in file or array

I'm trying to print duplicate lines from the filehandle, not remove them or anything else I see asked on other questions. I don't have enough experience with perl to be able to quickly do this, so I'm asking here. What's the way to do this? Axeman Using the standard Perl shorthands: my %seen; while ( <> ) { print if $seen{$_}++; } As a "one-liner": perl -ne 'print if $seen{$_}++' More data? This prints <file name>:<line number>:<line> : perl -ne 'print ( $ARGV eq "-" ? "" : "$ARGV:" ), "$.:$_" if $seen{$_}++' Explanation of %seen : %seen declares a hash. For each unique line in the input

How do I iterate over cin line by line in C++?

阅读更多关于 How do I iterate over cin line by line in C++?

问题 I want to iterate over std::cin , line by line, addressing each line as a std::string . Which is better: string line; while (getline(cin, line)) { // process line } or for (string line; getline(cin, line); ) { // process line } ? What is the normal way to do this? 回答1: Since UncleBen brought up his LineInputIterator, I thought I'd add a couple more alternative methods. First up, a really simple class that acts as a string proxy: class line { std::string data; public: friend std::istream