How to split a large text file into smaller files with equal number of lines?

前端未结

关注

 10  596

I\'ve got a large (by number of lines) plain text file that I\'d like to split into smaller files, also by number of lines. So if my file has around 2M lines, I\'d like to

相关标签:

10条回答

遇见更好的自我

2020-11-22 17:13
In case you just want to split by x number of lines each file, the given answers about split are OK. But, i am curious about no one paid attention to requirements:
- "without having to count them" -> using wc + cut
- "having the remainder in extra file" -> split does by default
I can't do that without "wc + cut", but I'm using that:
```
split -l  $(expr `wc $filename | cut -d ' ' -f3` / $chunks) $filename
```
This can be easily added to your bashrc functions so you can just invoke it passing filename and chunks:
```
 split -l  $(expr `wc $1 | cut -d ' ' -f3` / $2) $1
```
In case you want just x chunks without remainder in extra file, just adapt the formula to sum it (chunks - 1) on each file. I do use this approach because usually i just want x number of files rather than x lines per file:
```
split -l  $(expr `wc $1 | cut -d ' ' -f3` / $2 + `expr $2 - 1`) $1
```
You can add that to a script and call it your "ninja way", because if nothing suites your needs, you can build it :-)
0 讨论(0)
发布评论:

提交评论
- 加载中...
长发绾君心

2020-11-22 17:13
you can also use awk
```
awk -vc=1 'NR%200000==0{++c}{print $0 > c".txt"}' largefile
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
醉话见心

2020-11-22 17:17
Use:
```
sed -n '1,100p' filename > output.txt
```
Here, 1 and 100 are the line numbers which you will capture in output.txt.
0 讨论(0)
发布评论:

提交评论
- 加载中...
没有蜡笔的小新

2020-11-22 17:21
How about the split command?
```
split -l 200000 mybigfile.txt
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

南旧

2020-11-22 17:22

Have you looked at the split command?

$ split --help
Usage: split [OPTION] [INPUT [PREFIX]]
Output fixed-size pieces of INPUT to PREFIXaa, PREFIXab, ...; default
size is 1000 lines, and default PREFIX is `x'.  With no INPUT, or when INPUT
is -, read standard input.

Mandatory arguments to long options are mandatory for short options too.
  -a, --suffix-length=N   use suffixes of length N (default 2)
  -b, --bytes=SIZE        put SIZE bytes per output file
  -C, --line-bytes=SIZE   put at most SIZE bytes of lines per output file
  -d, --numeric-suffixes  use numeric suffixes instead of alphabetic
  -l, --lines=NUMBER      put NUMBER lines per output file
      --verbose           print a diagnostic to standard error just
                            before each output file is opened
      --help     display this help and exit
      --version  output version information and exit

You could do something like this:

split -l 200000 filename

which will create files each with 200000 lines named xaa xab xac ...

Another option, split by size of output file (still splits on line breaks):

 split -C 20m --numeric-suffixes input_filename output_prefix

creates files like output_prefix01 output_prefix02 output_prefix03 ... each of max size 20 megabytes.

0 讨论(0)

渐次进展

2020-11-22 17:27

use split

Split a file into fixed-size pieces, creates output files containing consecutive sections of INPUT (standard input if none is given or INPUT is `-')

Syntax split [options] [INPUT [PREFIX]]

http://ss64.com/bash/split.html

0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页