I have a text file that contains the following contents. I want to split this file into multiple files (1.txt, 2.txt, 3.txt...). Each a new output file will be as the follow
try re.findall() function:
import re
with open('input.txt', 'r') as f:
data = f.read()
found = re.findall(r'\n*(A.*?\n\$\$)\n*', data, re.M | re.S)
[open(str(i)+'.txt', 'w').write(found[i-1]) for i in range(1, len(found)+1)]
Minimalistic approach for the first 3 occurrences:
import re
found = re.findall(r'\n*(A.*?\n\$\$)\n*', open('input.txt', 'r').read(), re.M | re.S)
[open(str(found.index(f)+1)+'.txt', 'w').write(f) for f in found[:3]]
Some explanations:
found = re.findall(r'\n*(A.*?\n\$\$)\n*', data, re.M | re.S)
will find all occurrences matching the specified RegEx and will put them into the list, called found
[open(str(found.index(f)+1)+'.txt', 'w').write(f) for f in found]
iterate (using list comprehensions) through all elements belonging to found
list and for each element create text file (which is called like "index of the element + 1
.txt") and write that element (occurrence) to that file.
Another version, without RegEx's:
blocks_to_read = 3
blk_begin = 'A'
blk_end = '$$'
with open('35916503.txt', 'r') as f:
fn = 1
data = []
write_block = False
for line in f:
if fn > blocks_to_read:
break
line = line.strip()
if line == blk_begin:
write_block = True
if write_block:
data.append(line)
if line == blk_end:
write_block = False
with open(str(fn) + '.txt', 'w') as fout:
fout.write('\n'.join(data))
data = []
fn += 1
PS i, personally, don't like this version and i would use the one using RegEx
Looks to me that the condition that you should be checking for is a line
that contains just the carriage return (\n
) character. When you encounter such a line
, write the contents of the parsed file so far, close the file, and open another one for writing.
The blocks are divided by empty lines. Try this:
import sys
lines = [line for line in sys.stdin.readlines()]
i = 1
o = open("1{}.txt".format(i), "w")
for line in lines:
if len(line.strip()) == 0:
o.close()
i = i + 1
o = open("{}.txt".format(i), "w")
else:
o.write(line)
open 1.txt
in the beginning for writing. Write each line to the current output file. Additionally, if line.strip() == '$$'
, close the old file and open a new one for writing.
Read your input file and write to an output each time you find a "$$" and increase the counter of output files, code :
with open("input.txt", "r") as f:
buff = []
i = 1
for line in f:
if line.strip(): #skips the empty lines
buff.append(line)
if line.strip() == "$$":
output = open('%d.txt' % i,'w')
output.write(''.join(buff))
output.close()
i+=1
buff = [] #buffer reset
EDIT: should be efficient too https://wiki.python.org/moin/PythonSpeed/PerformanceTips#String_Concatenation