Efficient way to assign values from another column pandas df

后端 未结 4 1535
慢半拍i
慢半拍i 2021-01-13 07:58

I\'m trying to create a more efficient script that creates a new column based off values in another column. The script below performs this but I can only select

相关标签:
4条回答
  • 2021-01-13 08:20

    You can use:

    def f(x):
        #get unique days
        u = x['Day'].unique()
        #mapping dictionary
        d = dict(zip(u, np.arange(len(u)) // 3 + 1))
        x['new'] = x['Day'].map(d)
        return x
    
    df = df.groupby('Location', sort=False).apply(f)
    #add Location column
    s = df['new'].astype(str) + df['Location']
    #encoding by factorize
    df['new'] = pd.Series(pd.factorize(s)[0] + 1).map(str).radd('C')
    print (df)
          Day Location new
    0     Mon     Home  C1
    1    Tues     Home  C1
    2     Wed     Away  C2
    3     Wed     Home  C1
    4   Thurs     Away  C2
    5   Thurs     Home  C3
    6     Fri     Home  C3
    7     Mon     Home  C1
    8     Sat     Home  C3
    9     Fri     Away  C2
    10    Sun     Home  C4
    
    0 讨论(0)
  • 2021-01-13 08:25

    Not as pretty, but much faster than the groupby/apply method...

    def get_ordered_unique(a):
        u, idx = np.unique(a, return_index=True)
        # get ordered unique values
        return a[np.sort(idx)]
    
    # split ordered unique value array into arrays of size 3
    def find_ugrps(a):
        ord_u = get_ordered_unique(a)
    
        if ord_u.size > 3:
            split_idxs = [i for i in range(1, ord_u.size) if i % 3 == 0]
            u_grps = np.split(ord_u, split_idxs)
        else:
            u_grps = [ord_u]
    
        return u_grps
    
    locs = pd.factorize(df.Location)[0] + 1
    days = pd.factorize(df.Day)[0] + 1
    
    assign = np.zeros(days.size).astype(int)
    unique_locs = get_ordered_unique(locs)
    
    i = 0
    for loc in unique_locs:
        i += 1
        loc_idxs = np.where(locs == loc)[0]
        # find the ordered unique day values for each loc val slice
        these_unique_days = get_ordered_unique(days[loc_idxs])
        # split into ordered groups of three
        these_3day_grps = find_ugrps(these_unique_days)
        # assign integer for days found within each group
        for ugrp in these_3day_grps:
            day_idxs = np.where(np.isin(days[loc_idxs], ugrp))[0]
            np.put(assign, loc_idxs[day_idxs], i)
            i += 1
    
    # set proper ordering within assign array using factorize
    df['Assign'] = (pd.factorize(assign)[0] + 1)
    df['Assign'] = 'C' + df['Assign'].astype(str)
    
    print(df)
    
          Day Location Assign
    0     Mon     Home     C1
    1    Tues     Home     C1
    2     Wed     Away     C2
    3     Wed     Home     C1
    4   Thurs     Away     C2
    5   Thurs     Home     C3
    6     Fri     Home     C3
    7     Mon     Home     C1
    8     Sat     Home     C3
    9     Fri     Away     C2
    10    Sun     Home     C4
    
    0 讨论(0)
  • 2021-01-13 08:32
    # DataFrame Given
    df = pd.DataFrame({
        'Day' : ['Mon','Tues','Mon','Wed','Thurs','Fri','Mon','Sat','Sun','Tues'],                 
        'Location' : ['Home','Home','Away','Home','Home','Home','Home','Home','Home','Away'],                   
         })
    Unique_group = ['Mon','Tues','Wed']
    df['Group'] = df['Day'].apply(lambda x:1 if x in Unique_group else 2)
    df['Assign'] = np.zeros(len(df))
    # Assigning the ditionary values for output from numeric
    vals = dict([(i,'C'+str(i)) for i in range(len(df))])
    

    Loop to cut the dataframe for each line and checking the previous 'Assign' column info to assign new value

    for i in range(1,len(df)+1,1):
        # Slicing the Dataframe line by line
        df1 = df[:i]
        # Incorporating the conditions of Group and Location
        df1 = df1[(df1.Location == df1.Location.loc[i-1]) & (df1.Group == df1.Group.loc[i-1]) ]
        # Writing the 'Assign' value for the first line of sliced df
        if len(df1)==1:
            df.loc[i-1,'Assign'] = df[:i].Assign.max()+1
        # Writing the 'Assign value based on previous values if it has contiuos 2 values of same group
        elif (df1.Assign.value_counts()[df1.Assign.max()] <3):
            df.loc[i-1,'Assign'] = df1.Assign.max()
        # Writing 'Assign' value for new group
        else:
            df.loc[i-1,'Assign'] = df[:i]['Assign'].max()+1
    df.Assign = df.Assign.map(vals)
    

    Out:

         Day    Location    Group   Assign
    0   Mon Home    1   C1
    1   Tues    Home    1   C1
    2   Mon Away    1   C2
    3   Wed Home    1   C1
    4   Thurs   Home    2   C3
    5   Fri Home    2   C3
    6   Mon Home    1   C4
    7   Sat Home    2   C3
    8   Sun Home    2   C5
    9   Tues    Away    1   C2
    
    0 讨论(0)
  • 2021-01-13 08:35

    On the second attempt this works.

    It was quite hard to understand the question.

    I was sure that this should be done with pandas groupby() and dataframe merging, if you check the history of this reply you can see how I changed the answer to replace more slow Python code with fast Pandas code.

    The code below first counts the unique values per location and then uses a helper data frame to create the final value.

    I recommend pasting this code into a Jupyter notebook and to examine the intermediary steps.

    import pandas as pd
    import numpy as np
    
    d = ({
        'Day' : ['Mon','Tues','Wed','Wed','Thurs','Thurs','Fri','Mon','Sat','Fri','Sun'],                 
        'Location' : ['Home','Home','Away','Home','Away','Home','Home','Home','Home','Away','Home'],        
        })
    
    df = pd.DataFrame(data=d)
    
    # including the example result
    df["example"] = pd.Series(["C" + str(e) for e in [1, 1, 2, 1, 2, 3, 3, 1, 3, 2, 4]])
    
    # this groups days per location
    s_grouped = df.groupby(["Location"])["Day"].unique()
    
    # This is the 3 unique indicator per location
    df["Pre-Assign"] = df.apply(
        lambda x: 1 + list(s_grouped[x["Location"]]).index(x["Day"]) // 3, axis=1
    )
    
    # Now we want these unique per combination
    df_pre = df[["Location", "Pre-Assign"]].drop_duplicates().reset_index().drop("index", 1)
    df_pre["Assign"] = 'C' + (df_pre.index + 1).astype(str)
    
    # result
    df.merge(df_pre, on=["Location", "Pre-Assign"], how="left")
    

    Result

    Other data frames / series:

    0 讨论(0)
提交回复
热议问题