Populate column in data frame based on a range found in another dataframe

前端 未结 3 1472
生来不讨喜
生来不讨喜 2021-01-25 18:07

I\'m attempting to populate a column in a data frame based on whether the index value of that record falls within a range defined by two columns in another data frame.

d

相关标签:
3条回答
  • 2021-01-25 18:19
    import pandas as pd
    import numpy as np
    
    # Here is your existing dataframe
    df_existing = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
    
    # Create a new empty dataframe with specific column names and data types
    df_new = pd.DataFrame(index=None)
    columns = ['field01','field02','field03','field04']
    dtypes = [str,int,int,int]
    for c,d in zip(columns, dtypes):
        df_new[c] = pd.Series(dtype=d)
    
    # Set the index on the new dataframe to same as existing 
    df_new['new_index'] = df_existing.index
    df_new.set_index('new_index', inplace=True)
    
    # Fill the new dataframe with specific fields from the existing dataframe
    df_new[['field02','field03']] = df_existing[['B','C']]
    print df_new
    
    0 讨论(0)
  • 2021-01-25 18:36

    Alternative solution:


    classdict = df2.set_index("CLASS").to_dict("index")
    
    rangedict = {}
    
    for key,value in classdict.items():
    
        # get all items in range and assign value (the key)
        for item in list(range(value["START"],value["STOP"]+1)):
            rangedict[item] = key
    

    extract rangedict:

    {2: 1, 3: 1, 5: 2, 6: 2, 7: 2, 8: 3}
    

    now map and possibly format(?):

    df1['CLASS'] = df1.index.to_series().map(rangedict)
    df1.applymap("{0:.0f}".format)
    

    outputs:

    a   CLASS
    0   4   nan
    1   45  nan
    2   7   1
    3   5   1
    4   48  nan
    5   44  2
    6   22  2
    7   89  2
    8   45  3
    9   44  nan
    10  23  nan
    
    0 讨论(0)
  • 2021-01-25 18:39

    You can use IntervalIndex (requires v0.20.0).

    First construct the index:

    df2.index = pd.IntervalIndex.from_arrays(df2['START'], df2['STOP'], closed='both')
    
    df2
    Out: 
            START  STOP  CLASS
    [2, 3]      2     3      1
    [5, 7]      5     7      2
    [8, 8]      8     8      3
    

    Now if you index into the second DataFrame it will lookup the value in the intervals. For example,

    df2.loc[6]
    Out: 
    START    5
    STOP     7
    CLASS    2
    Name: [5, 7], dtype: int64
    

    returns the second class. I don't know if it can be used with merge or with merge_asof but as an alternative you can use map:

    df1['CLASS'] = df1.index.to_series().map(df2['CLASS'])
    

    Note that I first converted the index to a Series to be able to use the Series.map method. This results in

    df1
    Out: 
         a  CLASS
    0    4    NaN
    1   45    NaN
    2    7    1.0
    3    5    1.0
    4   48    NaN
    5   44    2.0
    6   22    2.0
    7   89    2.0
    8   45    3.0
    9   44    NaN
    10  23    NaN
    
    0 讨论(0)
提交回复
热议问题