Split file by vector of line numbers

前端 未结 5 1669
礼貌的吻别
礼貌的吻别 2021-01-27 03:41

I have a large file, about 10GB. I have a vector of line numbers which I would like to use to split the file. Ideally I would like to accomplish this using command-line utilitie

5条回答
  •  面向向阳花
    2021-01-27 04:20

    Ok, I've gone totally mental this morning, and I came up with a Sed program (with functions, loops, and all) to generate a Sed script to make what you want.

    Usage:

    • put the script in a file (e.g. make.sed) and chmod +x it;
    • then use it as the script for this Sed command sed "$(./make.sed <<< '1 4')" inputfile¹

    Note that ./make.sed <<< '1 4' generates the following sed script:

    1,1{w file.1
    be};1,4{w file.2
    be};1,${w file.3
    be};:e
    

    ¹ Unfortunately I misread the question, so my script works taking the line number of the last line of each block that you want to write to file, so your 2 5 has to be changed to 1 4 to be fed to my script.

    #!/usr/bin/env -S sed -Ef
    
    ###########################################################
    # Main
    # make a template sed script, in which we only have to increase
    # the number of each numbered output file, each of which is marked
    # with a trailing \x0
    b makeSkeletonAndMarkNumbers
    :skeletonMade
    
    # try putting a stencil on the rightmost digit of the first marked number on
    # the line and loop, otherwise exit
    b stencilLeastDigitOfNextMarkedNumber
    :didStencilLeastDigitOfNextMarkedNumber?
    t nextNumberStenciled
    b exit
    
    # continue processing next number by adding 1
    :nextNumberStenciled
    b numberAdd1
    :numberAdded1
    
    # try putting a stencil on the rightmost digit of the next marked number on
    # the line and loop, otherwise we're done with the first marked number, we can
    # clean its marker, and we can loop
    b stencilNextNumber
    :didStencilNextNumber?
    t nextNumberStenciled
    b removeStencilAndFirstMarker
    :removeStencilAndFirstMarkerDone
    b stencilLeastDigitOfNextMarkedNumber
    
    ###########################################################
    # puts a \n on each side of the first digit marked on the right by \x0
    :stencilLeastDigitOfNextMarkedNumber
    tr
    :r
    s/([0-9])\x0;/\n\1\n\x0;/1
    b didStencilLeastDigitOfNextMarkedNumber?
    
    ###########################################################
    # makes desired sed script skeleton from space-separated numbers
    :makeSkeletonAndMarkNumbers
    s/$/ $/
    s/([1-9]+|\$) +?/1,\1{w file.0\x0;be};/g
    s/$/:e/
    b skeletonMade
    
    ###########################################################
    # moves the stencil to the next number followed by \x0
    :stencilNextNumber
    trr
    :rr
    s/\n(.)\n([^\x0]*\x0[^\x0]+)([0-9])\x0/\1\2\n\3\n\x0/
    b didStencilNextNumber?
    
    ###########################################################
    # +1 with carry to last digit on the line enclosed in between two \n characters
    :numberAdd1
    #i\
    #\nprima della somma:
    #l
    :digitPlus1
    h
    s/.*\n([0-9])\n.*/\1/
    y/0123456789/1234567890/
    G
    s/(.)\n(.*)\n.\n/\2\n\1\n/
    trrr
    :rrr
    /[0-9]\n0\n/s/(.)\n0\n/\n\1\n0/
    t digitPlus1
    # the following line can be problematic for lines starting with number
    /[^0-9]\n0\n/s/(.)\n0\n/\n\1\n10/
    b numberAdded1
    
    ###########################################################
    # remove stencil and first marker on line
    :removeStencilAndFirstMarker
    s/\n(.)\n/\1/
    s/\x0//
    b removeStencilAndFirstMarkerDone
    
    ###########################################################
    :exit
    # a bit of post processing the `w` command has to be followed
    # by the filename, then by a newline, so we change the appropriate `;`s to `\n`.
    s/(\{[^;]+);/\1\n/g
    

提交回复
热议问题