Python for loop with if/else and append function

后端 未结 3 757
别那么骄傲
别那么骄傲 2021-01-21 18:26

On the basis of list as below, I have to create a DataFrame with \"state\" and \"region\" columns:

Original data:

 Alabama[edit]
 Auburn (Auburn Universi         


        
相关标签:
3条回答
  • 2021-01-21 18:55

    Shortest version I could think of:

    import pandas as pd
    
    lst = list()
    
    with open('university_towns.txt', 'r', newline='\n') as infile:
        for line in infile.readlines():
            if '[edit]' in line:
                state = line.split('[')[0]
            else:
                lst.append([state, line.split(' ')[0]])
    
    df = pd.DataFrame(lst, columns=['State', 'RegionName'])
    print(df)
    

    Produces on my machine (Python 3.6):

          State    RegionName
    0   Alabama        Auburn
    1   Alabama      Florence
    2   Alabama  Jacksonville
    3   Alabama    Livingston
    4   Alabama    Montevallo
    5   Alabama          Troy
    6   Alabama    Tuscaloosa
    7   Alabama      Tuskegee
    8    Alaska     Fairbanks
    9   Arizona     Flagstaff
    10  Arizona         Tempe
    
    0 讨论(0)
  • 2021-01-21 18:56

    You can find an example of cleaning this dataset in the tutorial Pythonic Data Cleaning With NumPy and Pandas.

    Option 1: Do String Processing in "Pure Python"

    You can use a greedy for-loop over the lines of the file and load in O(n) time:

    import pandas as pd
    
    university_towns = []
    
    with open('input/university_towns.txt') as file:
        for line in file:
            edit_pos = line.find('[edit]')
            if edit_pos != -1:
                # Remember this `state` until the next is found
                state = line[:edit_pos]
            else:
                # Otherwise, we have a city; keep `state` as last-seen
                parens = line.find(' (')
                town = line[:parens] if parens != -1 else line
                university_towns.append((state, town))
    
    towns_df = pd.DataFrame(university_towns,
                            columns=['State', 'RegionName'])
    

    Option 2: Do String Processing via Pandas API

    Alternatively, you can do the string processing with Pandas' .str accessor:

    import re
    
    import pandas as pd
    
    university_towns = []
    
    with open('input/university_towns.txt') as file:
        for line in file:
            if '[edit]' in line:
                # Remember this `state` until the next is found
                state = line
            else:
                # Otherwise, we have a city; keep `state` as last-seen
                university_towns.append((state, line))
    
    towns_df = pd.DataFrame(university_towns,
                            columns=['State', 'RegionName'])
    
    towns_df['State'] = towns_df.State.str.replace(r'\[edit\]\n', '')
    towns_df['RegionName'] = towns_df.RegionName\
        .str.strip()\
        .str.replace(r' \(.*', '')\
        .str.replace(r'\[.*', '')
    

    Output:

    >>> towns_df.head()
         State    RegionName
    0  Alabama        Auburn
    1  Alabama      Florence
    2  Alabama  Jacksonville
    3  Alabama    Livingston
    4  Alabama    Montevallo
    
    0 讨论(0)
  • 2021-01-21 18:56

    if I uderstand your question and desired output correct, you could do something like this:

    univeristylist = []
    with open('university_towns.txt', 'r') as file:
        for line in file:
            if '[edit]' in line:
                state = row
            else:
                universitylist.append([state, row])
    
    df = pd.DataFrame(universitylist, columns=['State', 'RegionName'])
    

    If you don't want the '[edit]' and '[1]' part etc, then you could change the code to:

    univeristylist = []
    with open('university_towns.txt', 'r') as file:
        for line in file:
            if '[edit]' in line:
                state = row.split(' [')[0]
            else:
                universitylist.append([state, row.split(' [')[0]])
    
    df = pd.DataFrame(columns=['State', 'RegionName'])
    
    0 讨论(0)
提交回复
热议问题