Splitting large text file on every blank line

后端未结

关注

 9  907

I\'m having a bit trouble of splitting a large text file into multiple smaller ones. Syntax of my text file is the following:

相关标签:

9条回答

野趣味

2020-12-05 08:31

You can also try split -p "^$"

0 讨论(0)
发布评论:

提交评论
- 加载中...
一向

2020-12-05 08:34
Perl has a useful feature called the input record separator. $/.

This is the 'marker' for separating records when reading a file.

So:
```
#!/usr/bin/env perl
use strict;
use warnings;

local $/ = "\n\n"; 
my $count = 0; 

while ( my $chunk = <> ) {
    open ( my $output, '>', "filename_".$count++ ) or die $!;
    print {$output} $chunk;
    close ( $output ); 
}
```
Just like that. The <> is the 'magic' filehandle, in that it reads piped data or from files specified on command line (opens them and reads them). This is similar to how sed or grep work.

This can be reduced to a one liner:
```
perl -00 -pe 'open ( $out, '>', "filename_".++$n ); select $out;'  yourfilename_here
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
天涯浪人

2020-12-05 08:34
In case you get "too many open files" error as follows...
```
awk: whatever-18.txt makes too many open files
 input record number 18, file file.txt
 source line number 1
```
You may need to close newly created file, before creating a new one, as follows.
```
awk -v RS= '{close("whatever-" i ".txt"); i++}{print > ("whatever-" i ".txt")}' file.txt
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
难免孤独

2020-12-05 08:34
You could use the csplit command:
```
csplit \
    --quiet \
    --prefix=whatever \
    --suffix-format=%02d.txt \
    --suppress-matched \
    infile.txt /^$/ {*}
```
POSIX csplit only uses short options and doesn't know --suffix and --suppress-matched, so this requires GNU csplit.

This is what the options do:
- --quiet – suppress output of file sizes
- --prefix=whatever – use whatever instead fo the default xx filename prefix
- --suffix-format=%02d.txt – append .txt to the default two digit suffix
- --suppress-matched – don't include the lines matching the pattern on which the input is split
- /^$/ {*} – split on pattern "empty line" (/^$/) as often as possible ({*})
0 讨论(0)
发布评论:

提交评论
- 加载中...

忘了有多久

2020-12-05 08:41

You can use this awk,

awk 'BEGIN{file="content"++i".txt"} !NF{file="content"++i".txt";next} {print > file}' yourfile

(OR)

awk 'BEGIN{i++} !NF{++i;next} {print > "filename"i".txt"}' yourfile

More readable format:

BEGIN {
        file="content"++i".txt"
}
!NF {
        file="content"++i".txt";
        next
}
{
        print > file
}

0 讨论(0)

后悔当初

2020-12-05 08:44

Since it's Friday and I'm feeling a bit helpful... :)

Try this. If the file is as small as you imply it's simplest to just read it all at once and work in memory.

use strict;
use warnings;

# slurp file
local $/ = undef;
open my $fh, '<', 'test.txt' or die $!;
my $text = <$fh>;
close $fh;

# split on double new line
my @chunks = split(/\n\n/, $text);

# make new files from chunks
my $count = 1;
for my $chunk (@chunks) {
    open my $ofh, '>', "whatever$count.txt" or die $!;
    print $ofh $chunk, "\n";
    close $ofh;
    $count++;
}

The perl docs can explain any individual commands you don't understand but at this point you should probably look into a tutorial as well.

0 讨论(0)

1 2 下一页