How can I split a text file into multiple text files using python?

前端未结

关注

 5  489

I have a text file that contains the following contents. I want to split this file into multiple files (1.txt, 2.txt, 3.txt...). Each a new output file will be as the follow

相关标签:

5条回答

难免孤独

2020-12-22 11:01

try re.findall() function:

import re

with open('input.txt', 'r') as f:
    data = f.read()

found = re.findall(r'\n*(A.*?\n\$\$)\n*', data, re.M | re.S)

[open(str(i)+'.txt', 'w').write(found[i-1]) for i in range(1, len(found)+1)]

Minimalistic approach for the first 3 occurrences:

import re

found = re.findall(r'\n*(A.*?\n\$\$)\n*', open('input.txt', 'r').read(), re.M | re.S)

[open(str(found.index(f)+1)+'.txt', 'w').write(f) for f in found[:3]]

Some explanations:

found = re.findall(r'\n*(A.*?\n\$\$)\n*', data, re.M | re.S)

will find all occurrences matching the specified RegEx and will put them into the list, called found

[open(str(found.index(f)+1)+'.txt', 'w').write(f) for f in found]

iterate (using list comprehensions) through all elements belonging to found list and for each element create text file (which is called like "index of the element + 1.txt") and write that element (occurrence) to that file.

Another version, without RegEx's:

blocks_to_read = 3
blk_begin = 'A'
blk_end = '$$'

with open('35916503.txt', 'r') as f:
    fn = 1
    data = []
    write_block = False
    for line in f:
        if fn > blocks_to_read:
            break 
        line = line.strip()
        if line == blk_begin:
            write_block = True
        if write_block:
            data.append(line)
        if line == blk_end:
            write_block = False
            with open(str(fn) + '.txt', 'w') as fout:
                fout.write('\n'.join(data))
                data = []
            fn += 1

PS i, personally, don't like this version and i would use the one using RegEx

0 讨论(0)

囚心锁ツ

2020-12-22 11:12

Looks to me that the condition that you should be checking for is a line that contains just the carriage return (\n) character. When you encounter such a line, write the contents of the parsed file so far, close the file, and open another one for writing.

0 讨论(0)
发布评论:

提交评论
- 加载中...

孤独总比滥情好

2020-12-22 11:14

The blocks are divided by empty lines. Try this:

import sys

lines = [line for line in sys.stdin.readlines()]
i = 1
o = open("1{}.txt".format(i), "w")
for line in lines:
    if len(line.strip()) == 0:
        o.close()
        i = i + 1
        o = open("{}.txt".format(i), "w")
    else:
        o.write(line)

0 讨论(0)

闹比i

2020-12-22 11:19

open 1.txt in the beginning for writing. Write each line to the current output file. Additionally, if line.strip() == '$$', close the old file and open a new one for writing.

0 讨论(0)
发布评论:

提交评论
- 加载中...

日久生厌

2020-12-22 11:24

Read your input file and write to an output each time you find a "$$" and increase the counter of output files, code :

with open("input.txt", "r") as f:
    buff = []
    i = 1
    for line in f:
        if line.strip():  #skips the empty lines
           buff.append(line)
        if line.strip() == "$$":
           output = open('%d.txt' % i,'w')
           output.write(''.join(buff))
           output.close()
           i+=1
           buff = [] #buffer reset

EDIT: should be efficient too https://wiki.python.org/moin/PythonSpeed/PerformanceTips#String_Concatenation

0 讨论(0)