Split diary file into multiple files using Python

前端未结

关注

 3  795

I keep a diary file of tech notes. Each entry is timestamped like so:

# Monday 02012-05-07 at 01:45:20 PM

This is a sample note

Lorem ipsum dolor sit amet,


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  攒了一身酷        
                
              
                            
                2021-01-06 17:32
              
            
            
                                                                       
Here's the general ;-) approach:

f = open("diaryfile", "r")
body = []
for line in f:
    if your_regexp.match(line):
        if body:
            write_one(body)
        body = []
    body.append(line)
if body:
    write_one(body)
f.close()


In short, you just keep appending all lines to a list (body).  When you find a magical line, you call write_one() to dump what you have so far, and clear the list.  The last chunk of the file is a special case, because you're not going to find your magical regexp again.  So you again dump what you have after the loop.

You can make any transformations you like in your write_one() function.  For example, sounds like you want to remove the leading "# " from the input timestamp lines.  That's fine - just do, e.g.,

body[0] = body[0][2:]


in write_one.  All the lines can be written out in one gulp via, e.g.,

with open(file_name_extracted_from_body_goes_here, "w") as f:
    f.writelines(body)


You probably want to check that the file doesn't exist first!  If it's anything like my diary, the first line of many entries will be "Rotten day." ;-)
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  轮回少年        
                
              
                            
                2021-01-06 17:40
              
            
            
                                                                       
You set the "batch-file" tag in your question, so I wrote a Batch file .bat solution. Here it is:

@echo off
setlocal EnableDelayedExpansion

set daysOfWeek=/Monday/Tuesday/Wednesday/Thursday/Friday/Saturday/Sunday/

for /F "delims=" %%a in (input.txt) do (
   if not defined timeStamp (
      set timeStamp=%%a
   ) else if not defined fileName (
      set fileName=%%a
      (
      echo !timeStamp!
      echo/
      echo !fileName!
      echo/
      ) > "!fileName!.txt"
   ) else (
      for /F "tokens=2" %%b in ("%%a") do if "!daysOfWeek:/%%b/=!" equ "%daysOfWeek%" (
         echo %%a>> "!fileName!.txt"
      ) else (
         set timeStamp=%%a
         set "fileName="
      )
   )
)


For example:

C:\Users\Antonio\Documents\test
>dir /B
input.txt
test.bat

C:\Users\Antonio\Documents\test
>test

C:\Users\Antonio\Documents\test
>dir /B
Here is another one.txt
input.txt
test.bat
This is a sample note.txt

C:\Users\Antonio\Documents\test
>type "Here is another one.txt"
# Wednesday 02012-06-06 at 03:44:11 PM

Here is another one

Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia
deserunt mollit anim id est laborum.

C:\Users\Antonio\Documents\test
>type "This is a sample note.txt"
# Monday 02012-05-07 at 01:45:20 PM

This is a sample note

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  不知归路        
                
              
                            
                2021-01-06 17:43
              
            
            
                                                                       
It really doesn't require as much regex as you would think.

First just load the file so you have it based on new lines:

fl = 'file.txt'
with open(fl,'r') as f:
    lines = f.readlines()


now just loop through it! Compare each line with the regex you provided, and if it matches, that means it's a new date! 

Then you will grab the next non-empty line after that and set it as the name of the file. 

Then keep going through and writing lines to that specific file name until you hit another match to your regex, where you know it is now meant to be a new file. Here is the logic loop:

for line in lines:
    m = re.match(your regex)
    if m:
        new_file = True
    else:
        new_file = False
    #now you will know when it's a new entry so you can easily do the rest


Let me know if you need any more of the logic broken down. Hopefully this was helpful
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复