How to deal with this logic in pandas

后端 未结 1 648
灰色年华
灰色年华 2021-01-23 04:16

I have a data frame like following below.

   coutry      flag
0  China       red
1  Russia      green
2  China       yellow
3  Britain     yellow
4  Russia               


        
相关标签:
1条回答
  • 2021-01-23 04:46

    You can use factorize and add 1:

    df['coutry'] = pd.factorize(df.coutry)[0] + 1
    df['flag'] = pd.factorize(df.flag)[0] + 1
    print (df)
       coutry  flag
    0       1     1
    1       2     2
    2       1     3
    3       3     3
    4       2     2
    

    Then you can convert columns to categories by Categorical if need save memory:

    df['coutry'] = pd.Categorical(pd.factorize(df.coutry)[0] + 1)
    df['flag'] =  pd.Categorical(pd.factorize(df.flag)[0] + 1)
    print (df)
      coutry flag
    0      1    1
    1      2    2
    2      1    3
    3      3    3
    4      2    2
    print (df.dtypes)
    coutry    category
    flag      category
    dtype: object
    

    #1000 times larger df
    df = pd.concat([df]*1000).reset_index(drop=True)
    df['coutry'] = pd.Categorical(pd.factorize(df.coutry)[0] + 1)
    df['flag'] =  pd.factorize(df.flag)[0] + 1
    print (df)
         coutry  flag
    0         1     1
    1         2     2
    2         1     3
    3         3     3
    4         2     2
    5         1     1
    6         2     2
    ...
    ...
    
    print (df['coutry'].nbytes)
    5024
    
    print (df['flag'].nbytes)
    20000
    

    If need convert back, you can map values by dictionaries:

    b = [list(x) for x in pd.factorize(df.coutry.drop_duplicates())]
    d1 = dict(zip(b[0], b[1]))
    print (d1)
    {0: 'China', 1: 'Russia', 2: 'Britain'}
    
    b = [list(x) for x in pd.factorize(df.flag.drop_duplicates())]
    d2 = dict(zip(b[0], b[1]))
    print (d2)
    {0: 'red', 1: 'green', 2: 'yellow'}
    
    
    df['coutry'] =  pd.Categorical(pd.factorize(df.coutry)[0])
    df['flag'] =  pd.Categorical(pd.factorize(df.flag)[0])
    print (df)
       coutry  flag
    0       0     0
    1       1     1
    2       0     2
    3       2     2
    4       1     1
    
    df['coutry'] = df.coutry.map(d1)
    df['flag'] = df.flag.map(d2)
    print (df)
        coutry    flag
    0    China     red
    1   Russia   green
    2    China  yellow
    3  Britain  yellow
    4   Russia   green
    
    0 讨论(0)
提交回复
热议问题