Split File - Java/Linux

前端 未结 4 1807
情歌与酒
情歌与酒 2021-01-21 03:12

I have a large file contains nearly 250 million characters. Now, I want to split it into parts of each contains 30 million characters ( so first 8 parts will contains 30 million

4条回答
  •  天涯浪人
    2021-01-21 03:27

    One way is to use regular unix commands to split the file and the prepend the last 1000 bytes from the previous file.

    First split the file:

    split -b 30000000 inputfile part.
    

    Then, for each part (ignoring the farst make a new file starting with the last 1000 bytes from the previous:

    unset prev
    for i in part.*
    do if [ -n "${prev}" ]
      then 
        tail -c 1000 ${prev} > part.temp
        cat ${i} >> part.temp
        mv part.temp ${i}
      fi
      prev=${i}
    done
    

    Before assembling we again iterate over the files, ignoring the first and throw away the first 1000 bytes:

    unset prev
    for i in part.*
    do if [ -n "${prev}" ]
      then 
        tail -c +1001 ${i} > part.temp
        mv part.temp ${i}
      fi
      prev=${i}
    done
    

    Last step is to reassemble the files:

    cat part.* >> newfile
    

    Since there was no explanation of why the overlap was needed I just created it and then threw it away.

提交回复
热议问题