line-processing

Perl - Find duplicate lines in file or array

懵懂的女人 提交于 2019-12-28 11:56:06
问题 I'm trying to print duplicate lines from the filehandle, not remove them or anything else I see asked on other questions. I don't have enough experience with perl to be able to quickly do this, so I'm asking here. What's the way to do this? 回答1: Using the standard Perl shorthands: my %seen; while ( <> ) { print if $seen{$_}++; } As a "one-liner": perl -ne 'print if $seen{$_}++' More data? This prints <file name>:<line number>:<line> : perl -ne 'print ( $ARGV eq "-" ? "" : "$ARGV:" ), "$.:$_"

Get random lines from large files in bash

有些话、适合烂在心里 提交于 2019-12-10 09:37:47
问题 How can I get n random lines from very large files that can't fit in memory. Also it would be great if I could add filters before or after the randomization. update 1 in my case the specs are : > 100 million lines > 10GB files usual random batch size 10000-30000 512RAM hosted ubuntu server 14.10 so losing a few lines from the file won't be such a big problem as they have a 1 in 10000 chance anyway, but performance and resource consumption would be a problem 回答1: Here's a wee bash function for

Get random lines from large files in bash

余生长醉 提交于 2019-12-05 21:36:33
How can I get n random lines from very large files that can't fit in memory. Also it would be great if I could add filters before or after the randomization. update 1 in my case the specs are : > 100 million lines > 10GB files usual random batch size 10000-30000 512RAM hosted ubuntu server 14.10 so losing a few lines from the file won't be such a big problem as they have a 1 in 10000 chance anyway, but performance and resource consumption would be a problem Here's a wee bash function for you. It grabs, as you say, a "batch" of lines, with a random start point within a file. randline() { local

Split line with perl

放肆的年华 提交于 2019-12-02 05:33:42
问题 title: Football team: Real Madrid stadium: Santiago Bernabeu players: Zinédine Zidane, Ronaldo, Luís Figo, Roberto Carlos, Raúl personnel: José Mourinho (head coach) Aitor Karanka (assistant coach (es)) How to split this with perl in: title: Football team: Real Madrid stadium: Santiago Bernabeu players: Zinédine Zidane Ronaldo Luís Figo Roberto Carlos Raúl personnel: José Mourinho (head coach) Aitor Karanka (assistant coach (es)) 回答1: Use a lookahead assertion: say for split /(?=\w+:)/, $real

Split line with perl

坚强是说给别人听的谎言 提交于 2019-12-02 01:20:20
title: Football team: Real Madrid stadium: Santiago Bernabeu players: Zinédine Zidane, Ronaldo, Luís Figo, Roberto Carlos, Raúl personnel: José Mourinho (head coach) Aitor Karanka (assistant coach (es)) How to split this with perl in: title: Football team: Real Madrid stadium: Santiago Bernabeu players: Zinédine Zidane Ronaldo Luís Figo Roberto Carlos Raúl personnel: José Mourinho (head coach) Aitor Karanka (assistant coach (es)) Use a lookahead assertion: say for split /(?=\w+:)/, $real_madrid_string; Output title: Football team: Real Madrid stadium: Santiago Bernabeu players: Zinédine Zidane

Extracting specific lines with Perl

情到浓时终转凉″ 提交于 2019-12-02 00:10:36
问题 I am writing a perl program to extract lines that are in between the two patterns i am matching. for example the below text file has 6 lines. I am matching load balancer and end. I want to get the 4 lines that are in between. **load balancer** new old good bad **end** My question is how do you extract lines in between load balancer and end into an array. Any help is greatly appreciated. 回答1: You can use the flip-flop operator. Additionally, you can also use the return value of the flipflop to

Extracting specific lines with Perl

我们两清 提交于 2019-12-01 21:30:06
I am writing a perl program to extract lines that are in between the two patterns i am matching. for example the below text file has 6 lines. I am matching load balancer and end. I want to get the 4 lines that are in between. **load balancer** new old good bad **end** My question is how do you extract lines in between load balancer and end into an array. Any help is greatly appreciated. Schwern You can use the flip-flop operator . Additionally, you can also use the return value of the flipflop to filter out the boundary lines. The return value is a sequence number (starting with 1) and the

Perl - Find duplicate lines in file or array

淺唱寂寞╮ 提交于 2019-11-28 06:38:38
I'm trying to print duplicate lines from the filehandle, not remove them or anything else I see asked on other questions. I don't have enough experience with perl to be able to quickly do this, so I'm asking here. What's the way to do this? Axeman Using the standard Perl shorthands: my %seen; while ( <> ) { print if $seen{$_}++; } As a "one-liner": perl -ne 'print if $seen{$_}++' More data? This prints <file name>:<line number>:<line> : perl -ne 'print ( $ARGV eq "-" ? "" : "$ARGV:" ), "$.:$_" if $seen{$_}++' Explanation of %seen : %seen declares a hash. For each unique line in the input

How do I iterate over cin line by line in C++?

点点圈 提交于 2019-11-25 23:22:18
问题 I want to iterate over std::cin , line by line, addressing each line as a std::string . Which is better: string line; while (getline(cin, line)) { // process line } or for (string line; getline(cin, line); ) { // process line } ? What is the normal way to do this? 回答1: Since UncleBen brought up his LineInputIterator, I thought I'd add a couple more alternative methods. First up, a really simple class that acts as a string proxy: class line { std::string data; public: friend std::istream