Python for loop with if/else and append function

后端未结

关注

 3  757

On the basis of list as below, I have to create a DataFrame with \"state\" and \"region\" columns:

Original data:

 Alabama[edit]
 Auburn (Auburn Universi


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  甜味超标        
                
              
                            
                2021-01-21 18:55
              
            
            
                                                                       
Shortest version I could think of: 

import pandas as pd

lst = list()

with open('university_towns.txt', 'r', newline='\n') as infile:
    for line in infile.readlines():
        if '[edit]' in line:
            state = line.split('[')[0]
        else:
            lst.append([state, line.split(' ')[0]])

df = pd.DataFrame(lst, columns=['State', 'RegionName'])
print(df)


Produces on my machine (Python 3.6):

      State    RegionName
0   Alabama        Auburn
1   Alabama      Florence
2   Alabama  Jacksonville
3   Alabama    Livingston
4   Alabama    Montevallo
5   Alabama          Troy
6   Alabama    Tuscaloosa
7   Alabama      Tuskegee
8    Alaska     Fairbanks
9   Arizona     Flagstaff
10  Arizona         Tempe

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  心在旅途        
                
              
                            
                2021-01-21 18:56
              
            
            
                                                                       
You can find an example of cleaning this dataset in the tutorial Pythonic Data Cleaning With NumPy and Pandas.

Option 1: Do String Processing in "Pure Python"

You can use a greedy for-loop over the lines of the file and load in O(n) time:

import pandas as pd

university_towns = []

with open('input/university_towns.txt') as file:
    for line in file:
        edit_pos = line.find('[edit]')
        if edit_pos != -1:
            # Remember this `state` until the next is found
            state = line[:edit_pos]
        else:
            # Otherwise, we have a city; keep `state` as last-seen
            parens = line.find(' (')
            town = line[:parens] if parens != -1 else line
            university_towns.append((state, town))

towns_df = pd.DataFrame(university_towns,
                        columns=['State', 'RegionName'])


Option 2: Do String Processing via Pandas API

Alternatively, you can do the string processing with Pandas' .str accessor:

import re

import pandas as pd

university_towns = []

with open('input/university_towns.txt') as file:
    for line in file:
        if '[edit]' in line:
            # Remember this `state` until the next is found
            state = line
        else:
            # Otherwise, we have a city; keep `state` as last-seen
            university_towns.append((state, line))

towns_df = pd.DataFrame(university_towns,
                        columns=['State', 'RegionName'])

towns_df['State'] = towns_df.State.str.replace(r'\[edit\]\n', '')
towns_df['RegionName'] = towns_df.RegionName\
    .str.strip()\
    .str.replace(r' \(.*', '')\
    .str.replace(r'\[.*', '')


Output:

>>> towns_df.head()
     State    RegionName
0  Alabama        Auburn
1  Alabama      Florence
2  Alabama  Jacksonville
3  Alabama    Livingston
4  Alabama    Montevallo

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  失恋的感觉        
                
              
                            
                2021-01-21 18:56
              
            
            
                                                                       
if I uderstand your question and desired output correct, you could do something like this:

univeristylist = []
with open('university_towns.txt', 'r') as file:
    for line in file:
        if '[edit]' in line:
            state = row
        else:
            universitylist.append([state, row])

df = pd.DataFrame(universitylist, columns=['State', 'RegionName'])


If you don't want the '[edit]' and '[1]' part etc, then you could change the code to:

univeristylist = []
with open('university_towns.txt', 'r') as file:
    for line in file:
        if '[edit]' in line:
            state = row.split(' [')[0]
        else:
            universitylist.append([state, row.split(' [')[0]])

df = pd.DataFrame(columns=['State', 'RegionName'])

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复