I have a large file contains nearly 250 million characters. Now, I want to split it into parts of each contains 30 million characters ( so first 8 parts will contains 30 million
One way is to use regular unix commands to split the file and the prepend the last 1000 bytes from the previous file.
First split the file:
split -b 30000000 inputfile part.
Then, for each part (ignoring the farst make a new file starting with the last 1000 bytes from the previous:
unset prev
for i in part.*
do if [ -n "${prev}" ]
then
tail -c 1000 ${prev} > part.temp
cat ${i} >> part.temp
mv part.temp ${i}
fi
prev=${i}
done
Before assembling we again iterate over the files, ignoring the first and throw away the first 1000 bytes:
unset prev
for i in part.*
do if [ -n "${prev}" ]
then
tail -c +1001 ${i} > part.temp
mv part.temp ${i}
fi
prev=${i}
done
Last step is to reassemble the files:
cat part.* >> newfile
Since there was no explanation of why the overlap was needed I just created it and then threw it away.