I have a large file, about 10GB. I have a vector of line numbers which I would like to use to split the file. Ideally I would like to accomplish this using command-line utilitie
Ok, I've gone totally mental this morning, and I came up with a Sed program (with functions, loops, and all) to generate a Sed script to make what you want.
Usage:
make.sed
) and chmod +x
it;sed "$(./make.sed <<< '1 4')" inputfile
¹Note that ./make.sed <<< '1 4'
generates the following sed script:
1,1{w file.1
be};1,4{w file.2
be};1,${w file.3
be};:e
¹ Unfortunately I misread the question, so my script works taking the line number of the last line of each block that you want to write to file, so your 2 5
has to be changed to 1 4
to be fed to my script.
#!/usr/bin/env -S sed -Ef
###########################################################
# Main
# make a template sed script, in which we only have to increase
# the number of each numbered output file, each of which is marked
# with a trailing \x0
b makeSkeletonAndMarkNumbers
:skeletonMade
# try putting a stencil on the rightmost digit of the first marked number on
# the line and loop, otherwise exit
b stencilLeastDigitOfNextMarkedNumber
:didStencilLeastDigitOfNextMarkedNumber?
t nextNumberStenciled
b exit
# continue processing next number by adding 1
:nextNumberStenciled
b numberAdd1
:numberAdded1
# try putting a stencil on the rightmost digit of the next marked number on
# the line and loop, otherwise we're done with the first marked number, we can
# clean its marker, and we can loop
b stencilNextNumber
:didStencilNextNumber?
t nextNumberStenciled
b removeStencilAndFirstMarker
:removeStencilAndFirstMarkerDone
b stencilLeastDigitOfNextMarkedNumber
###########################################################
# puts a \n on each side of the first digit marked on the right by \x0
:stencilLeastDigitOfNextMarkedNumber
tr
:r
s/([0-9])\x0;/\n\1\n\x0;/1
b didStencilLeastDigitOfNextMarkedNumber?
###########################################################
# makes desired sed script skeleton from space-separated numbers
:makeSkeletonAndMarkNumbers
s/$/ $/
s/([1-9]+|\$) +?/1,\1{w file.0\x0;be};/g
s/$/:e/
b skeletonMade
###########################################################
# moves the stencil to the next number followed by \x0
:stencilNextNumber
trr
:rr
s/\n(.)\n([^\x0]*\x0[^\x0]+)([0-9])\x0/\1\2\n\3\n\x0/
b didStencilNextNumber?
###########################################################
# +1 with carry to last digit on the line enclosed in between two \n characters
:numberAdd1
#i\
#\nprima della somma:
#l
:digitPlus1
h
s/.*\n([0-9])\n.*/\1/
y/0123456789/1234567890/
G
s/(.)\n(.*)\n.\n/\2\n\1\n/
trrr
:rrr
/[0-9]\n0\n/s/(.)\n0\n/\n\1\n0/
t digitPlus1
# the following line can be problematic for lines starting with number
/[^0-9]\n0\n/s/(.)\n0\n/\n\1\n10/
b numberAdded1
###########################################################
# remove stencil and first marker on line
:removeStencilAndFirstMarker
s/\n(.)\n/\1/
s/\x0//
b removeStencilAndFirstMarkerDone
###########################################################
:exit
# a bit of post processing the `w` command has to be followed
# by the filename, then by a newline, so we change the appropriate `;`s to `\n`.
s/(\{[^;]+);/\1\n/g