How do I map df column values to hex color in one go?

后端 未结 1 742
别那么骄傲
别那么骄傲 2021-01-21 08:54

I have a pandas dataframe with two columns. One of the columns values needs to be mapped to colors in hex. Another graphing process takes over from there.

This is what

相关标签:
1条回答
  • 2021-01-21 09:14

    You may use matplotlib.colors.to_hex() to convert a color to hexadecimal representation.

    import pandas as pd
    import matplotlib
    import matplotlib.pyplot as plt
    import matplotlib.colors as mcolors
    
    import seaborn as sns
    
    # Create dataframe
    df = pd.DataFrame(np.random.randint(0,21,size=(7, 2)), columns=['some_value', 'another_value'])
    # Add a nan to handle realworld
    df.iloc[-1] = np.nan 
    
    # Try to map values to colors in hex
    # # Taken from here 
    norm = matplotlib.colors.Normalize(vmin=0, vmax=21, clip=True)
    mapper = plt.cm.ScalarMappable(norm=norm, cmap=plt.cm.viridis)
    
    df['some_value_color'] = df['some_value'].apply(lambda x: mcolors.to_hex(mapper.to_rgba(x)))
    df
    


    Efficiency

    The above method it easy to use, but may not be very efficient. In the folling let's compare some alternatives.

    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    import matplotlib.colors as mcolors
    
    def create_df(n=10):
        # Create dataframe
        df = pd.DataFrame(np.random.randint(0,21,size=(n, 2)), 
                          columns=['some_value', 'another_value'])
        # Add a nan to handle realworld
        df.iloc[-1] = np.nan
        return df
    

    The following is the solution from above. It applies the conversion to the dataframe row by row. This quite inefficient.

    def apply1(df):
        # map values to colors in hex via
        # matplotlib to_hex by pandas apply
        norm = mcolors.Normalize(vmin=np.nanmin(df['some_value'].values), 
                                           vmax=np.nanmax(df['some_value'].values), clip=True)
        mapper = plt.cm.ScalarMappable(norm=norm, cmap=plt.cm.viridis)
    
        df['some_value_color'] = df['some_value'].apply(lambda x: mcolors.to_hex(mapper.to_rgba(x)))
        return df
    

    That's why we might choose to calculate the values into a numpy array first and just assign this array as the newly created column.

    def apply2(df):
        # map values to colors in hex via
        # matplotlib to_hex by assigning numpy array as column
        norm = mcolors.Normalize(vmin=np.nanmin(df['some_value'].values), 
                                           vmax=np.nanmax(df['some_value'].values), clip=True)
        mapper = plt.cm.ScalarMappable(norm=norm, cmap=plt.cm.viridis)
        a = mapper.to_rgba(df['some_value'])
        df['some_value_color'] =  np.apply_along_axis(mcolors.to_hex, 1, a)
        return df
    

    Finally we may use a look up table (LUT) which is created from the matplotlib colormap, and index the LUT by the normalized data. Because this solution needs to create the LUT first, it is rather ineffienct for dataframes with less entries than the LUT has colors, but will pay off for large dataframes.

    def apply3(df):
        # map values to colors in hex via
        # creating a hex Look up table table and apply the normalized data to it
        norm = mcolors.Normalize(vmin=np.nanmin(df['some_value'].values), 
                                           vmax=np.nanmax(df['some_value'].values), clip=True)
        lut = plt.cm.viridis(np.linspace(0,1,256))
        lut = np.apply_along_axis(mcolors.to_hex, 1, lut)
        a = (norm(df['some_value'].values)*255).astype(np.int16)
        df['some_value_color'] = lut[a]
        return df
    

    Compare the timings Let's take a dataframe with 10000 rows. df = create_df(10000)

    • Original solution (apply1)

      %timeit apply1(df)
      2.66 s per loop
      
    • Array solution (apply2)

      %timeit apply2(df)
      240 ms per loop
      
    • LUT solution (apply3)

      %timeit apply1(df)
      7.64 ms per loop
      

    In this case the LUT solution gives almost a factor 400 of improvement.

    0 讨论(0)
提交回复
热议问题