How can I split a text file into multiple text files using python?

前端 未结 5 500
佛祖请我去吃肉
佛祖请我去吃肉 2020-12-22 10:22

I have a text file that contains the following contents. I want to split this file into multiple files (1.txt, 2.txt, 3.txt...). Each a new output file will be as the follow

相关标签:
5条回答
  • 2020-12-22 11:01

    try re.findall() function:

    import re
    
    with open('input.txt', 'r') as f:
        data = f.read()
    
    found = re.findall(r'\n*(A.*?\n\$\$)\n*', data, re.M | re.S)
    
    [open(str(i)+'.txt', 'w').write(found[i-1]) for i in range(1, len(found)+1)]
    

    Minimalistic approach for the first 3 occurrences:

    import re
    
    found = re.findall(r'\n*(A.*?\n\$\$)\n*', open('input.txt', 'r').read(), re.M | re.S)
    
    [open(str(found.index(f)+1)+'.txt', 'w').write(f) for f in found[:3]]
    

    Some explanations:

    found = re.findall(r'\n*(A.*?\n\$\$)\n*', data, re.M | re.S)
    

    will find all occurrences matching the specified RegEx and will put them into the list, called found

    [open(str(found.index(f)+1)+'.txt', 'w').write(f) for f in found]
    

    iterate (using list comprehensions) through all elements belonging to found list and for each element create text file (which is called like "index of the element + 1.txt") and write that element (occurrence) to that file.

    Another version, without RegEx's:

    blocks_to_read = 3
    blk_begin = 'A'
    blk_end = '$$'
    
    with open('35916503.txt', 'r') as f:
        fn = 1
        data = []
        write_block = False
        for line in f:
            if fn > blocks_to_read:
                break 
            line = line.strip()
            if line == blk_begin:
                write_block = True
            if write_block:
                data.append(line)
            if line == blk_end:
                write_block = False
                with open(str(fn) + '.txt', 'w') as fout:
                    fout.write('\n'.join(data))
                    data = []
                fn += 1
    

    PS i, personally, don't like this version and i would use the one using RegEx

    0 讨论(0)
  • 2020-12-22 11:12

    Looks to me that the condition that you should be checking for is a line that contains just the carriage return (\n) character. When you encounter such a line, write the contents of the parsed file so far, close the file, and open another one for writing.

    0 讨论(0)
  • 2020-12-22 11:14

    The blocks are divided by empty lines. Try this:

    import sys
    
    lines = [line for line in sys.stdin.readlines()]
    i = 1
    o = open("1{}.txt".format(i), "w")
    for line in lines:
        if len(line.strip()) == 0:
            o.close()
            i = i + 1
            o = open("{}.txt".format(i), "w")
        else:
            o.write(line)
    
    0 讨论(0)
  • 2020-12-22 11:19

    open 1.txt in the beginning for writing. Write each line to the current output file. Additionally, if line.strip() == '$$', close the old file and open a new one for writing.

    0 讨论(0)
  • 2020-12-22 11:24

    Read your input file and write to an output each time you find a "$$" and increase the counter of output files, code :

    with open("input.txt", "r") as f:
        buff = []
        i = 1
        for line in f:
            if line.strip():  #skips the empty lines
               buff.append(line)
            if line.strip() == "$$":
               output = open('%d.txt' % i,'w')
               output.write(''.join(buff))
               output.close()
               i+=1
               buff = [] #buffer reset
    

    EDIT: should be efficient too https://wiki.python.org/moin/PythonSpeed/PerformanceTips#String_Concatenation

    0 讨论(0)
提交回复
热议问题