Hey I need to split a large file in python into smaller files that contain only specific lines. How do I do this?
First, open the big file for reading.
Second, open all the smaller file names for writing.
Third, iterate through every line. Every iteration, check to see what kind of line it is, then write it to that file.
More info on File I/O: http://docs.python.org/tutorial/inputoutput.html
Do you mean breaking it down into subsections? Like if I had a file with chapter 1, chapter 2, and chapter 3, you want it to be broken down into separate files for each chapter?
The way I've done this is similar to Wilduck's response, but closes the input file as soon as it reads in the data and keeps all the lines read in.
data_file = open('large_file_name', 'r')
lines = data_file.readlines()
data_file.close()
outputFile = open('output_file_one', 'w')
for line in lines:
if 'SomeName' in line:
outputFile.write(line)
outputFile.close()
If you wanted to have more than one output file you could either add more loops or open more than one outputFile at a time.
I'd recommend using Wilducks response, however, as it uses less space and will take less time with larger files since the file is read only once.
You're probably going to want to do something like this:
big_file = open('big_file', 'r')
small_file1 = open('small_file1', 'w')
small_file2 = open('small_file2', 'w')
for line in big_file:
if 'Charlie' in line: small_file1.write(line)
if 'Mark' in line: small_file2.write(line)
big_file.close()
small_file1.close()
small_file2.close()
Opening a file for reading returns an object that allows you to iterate over the lines. You can then check each line (which is just a string of whatever that line contains) for whatever condition you want, then write it to the appropriate file that you opened for writing. It is worth noting that when you open a file with 'w' it will overwrite anything already written to that file. If you want to simply add to the end, you should open it with 'a', to append.
Additionally, if you expect there to be some possibility of error in your reading/writing code, and want to make sure the files are closed, you can use:
with open('big_file', 'r') as big_file:
<do stuff prone to error>
How big and does it need to be done in python? If this is on unix, would split/csplit/grep suffice?