Replace multiple newlines with single newlines during reading file

前端 未结 3 1140
梦毁少年i
梦毁少年i 2021-02-14 11:55

I have the next code which reads from multiple files, parses obtained lines and prints the result:

import os
import re

files=[]
pars=[]

for i in os.listdir(\'pa         


        
相关标签:
3条回答
  • 2021-02-14 12:11

    Just would like to point out: regexes aren't the best way to handle that. Replacing two empty lines by one in a Python str is quite simple, no need for re:

    entire_file = "whatever\nmay\n\nhappen"
    entire_file = entire_file.replace("\n\n", "\n")
    

    And voila! Much faster than re and (in my opinion) much easier to read.

    0 讨论(0)
  • 2021-02-14 12:16

    You could use a second regex to replace multiple new lines with a single new line and use strip to get rid of the last new line.

    import os
    import re
    
    files=[]
    pars=[]
    
    for i in os.listdir('path_to_dir_with_files'):
        files.append(i)
    
    for f in files:
        with open('path_to_dir_with_files/'+str(f), 'r') as a:
            word = re.sub(r'someword=|\,.*|\#.*','', a.read())
            word = re.sub(r'\n+', '\n', word).strip()
            pars.append(word)
    
    for k in pars:
       print k
    
    0 讨论(0)
  • 2021-02-14 12:35

    Without changing your code much, one easy way would just be to check if the line is empty before you print it, e.g.:

    import os
    import re
    
    files=[]
    pars=[]
    
    for i in os.listdir('path_to_dir_with_files'):
        files.append(i)
    
    for f in files:
        with open('path_to_dir_with_files'+str(f), 'r') as a:
            pars.append(re.sub('someword=|\,.*|\#.*','',a.read()))
    
    for k in pars:
        if not k.strip() == "":
            print k
    

    *** EDIT Since each element in pars is actually the entire content of the file (not just a line), you need to go through an replace any double end lines, easiest to do with re

    import os
    import re
    
    files=[]
    pars=[]
    
    for i in os.listdir('path_to_dir_with_files'):
        files.append(i)
    
    for f in files:
        with open('path_to_dir_with_files'+str(f), 'r') as a:
            pars.append(re.sub('someword=|\,.*|\#.*','',a.read()))
    
    for k in pars:
        k = re.sub(r"\n+", "\n", k)
        if not k.strip() == "":
            print k
    

    Note that this doesn't take care of the case where a file ends with a newline and the next one begins with one - if that's a case you are worried about you need to either add extra logic to deal with it or change the way you're reading the data in

    0 讨论(0)
提交回复
热议问题