python bokeh, how to make a correlation plot?

前端 未结 3 1449
慢半拍i
慢半拍i 2021-01-21 15:45

How can I make a correlation heatmap in Bokeh?

import pandas as pd
import bokeh.charts

df = pd.util.testing.makeTimeDataFrame(1000)
c = df.corr()

p = bokeh.cha         


        
相关标签:
3条回答
  • 2021-01-21 16:25

    I tried to create an interactive correlation plot using the Bokeh library. The code is the combination of different solutions available on SO and other websites. In above solution bigreddot has explained things in details. The code for correlation heatmap as below:

    import pandas as pd
    from bokeh.io import output_file, show
    from bokeh.models import BasicTicker, ColorBar, LinearColorMapper, ColumnDataSource, PrintfTickFormatter
    from bokeh.plotting import figure
    from bokeh.transform import transform
    from bokeh.palettes import Viridis3, Viridis256
    # Read your data in pandas dataframe
    data = pd.read_csv(%%%%%Your Path%%%%%)
    #Now we will create correlation matrix using pandas
    df = data.corr()
    
    df.index.name = 'AllColumns1'
    df.columns.name = 'AllColumns2'
    
    # Prepare data.frame in the right format
    df = df.stack().rename("value").reset_index()
    
    # here the plot :
    output_file("CorrelationPlot.html")
    
    # You can use your own palette here
    # colors = ['#d7191c', '#fdae61', '#ffffbf', '#a6d96a', '#1a9641']
    
    # I am using 'Viridis256' to map colors with value, change it with 'colors' if you need some specific colors
    mapper = LinearColorMapper(
        palette=Viridis256, low=df.value.min(), high=df.value.max())
    
    # Define a figure and tools
    TOOLS = "box_select,lasso_select,pan,wheel_zoom,box_zoom,reset,help"
    p = figure(
        tools=TOOLS,
        plot_width=1200,
        plot_height=1000,
        title="Correlation plot",
        x_range=list(df.AllColumns1.drop_duplicates()),
        y_range=list(df.AllColumns2.drop_duplicates()),
        toolbar_location="right",
        x_axis_location="below")
    
    # Create rectangle for heatmap
    p.rect(
        x="AllColumns1",
        y="AllColumns2",
        width=1,
        height=1,
        source=ColumnDataSource(df),
        line_color=None,
        fill_color=transform('value', mapper))
    
    # Add legend
    color_bar = ColorBar(
        color_mapper=mapper,
        location=(0, 0),
        ticker=BasicTicker(desired_num_ticks=10))
    
    p.add_layout(color_bar, 'right')
    
    show(p)
    

    References:

    [1] https://docs.bokeh.org/en/latest/docs/user_guide.html

    [2] Bokeh heatmap from Pandas confusion matrix

    0 讨论(0)
  • 2021-01-21 16:31

    So I think I can provide a baseline code to help do what you are asking using a combination of the answers above and some extra pre-processing.

    Let's assume you have a dataframe df already loaded (in this case the UCI Adult Data) and the correlation coefficients calculated (p_corr).

    import bisect
    #
    from math import pi
    from numpy import arange
    from itertools import chain
    from collections import OrderedDict
    #
    from bokeh.palettes import RdBu as colors  # just make sure to import a palette that centers on white (-ish)
    from bokeh.models import ColorBar, LinearColorMapper
    
    colors = list(reversed(colors[9]))  # we want an odd number to ensure 0 correlation is a distinct color
    labels = df.columns
    nlabels = len(labels)
    
    def get_bounds(n):
        """Gets bounds for quads with n features"""
        bottom = list(chain.from_iterable([[ii]*nlabels for ii in range(nlabels)]))
        top = list(chain.from_iterable([[ii+1]*nlabels for ii in range(nlabels)]))
        left = list(chain.from_iterable([list(range(nlabels)) for ii in range(nlabels)]))
        right = list(chain.from_iterable([list(range(1,nlabels+1)) for ii in range(nlabels)]))
        return top, bottom, left, right
    
    def get_colors(corr_array, colors):
        """Aligns color values from palette with the correlation coefficient values"""
        ccorr = arange(-1, 1, 1/(len(colors)/2))
        color = []
        for value in corr_array:
            ind = bisect.bisect_left(ccorr, value)
            color.append(colors[ind-1])
        return color
    
    p = figure(plot_width=600, plot_height=600,
               x_range=(0,nlabels), y_range=(0,nlabels),
               title="Correlation Coefficient Heatmap (lighter is worse)",
               toolbar_location=None, tools='')
    
    p.xgrid.grid_line_color = None
    p.ygrid.grid_line_color = None
    p.xaxis.major_label_orientation = pi/4
    p.yaxis.major_label_orientation = pi/4
    
    top, bottom, left, right = get_bounds(nlabels)  # creates sqaures for plot
    color_list = get_colors(p_corr.values.flatten(), colors)
    
    p.quad(top=top, bottom=bottom, left=left,
           right=right, line_color='white',
           color=color_list)
    
    # Set ticks with labels
    ticks = [tick+0.5 for tick in list(range(nlabels))]
    tick_dict = OrderedDict([[tick, labels[ii]] for ii, tick in enumerate(ticks)])
    # Create the correct number of ticks for each axis 
    p.xaxis.ticker = ticks
    p.yaxis.ticker = ticks
    # Override the labels 
    p.xaxis.major_label_overrides = tick_dict
    p.yaxis.major_label_overrides = tick_dict
    
    # Setup color bar
    mapper = LinearColorMapper(palette=colors, low=-1, high=1)
    color_bar = ColorBar(color_mapper=mapper, location=(0, 0))
    p.add_layout(color_bar, 'right')
    
    show(p)
    

    This will result in the following plot if the categories are integer encoded (this is a horrible data example):

    0 讨论(0)
  • 2021-01-21 16:46

    In modern Bokeh you should use the bokeh.plotting interface. You can see an example of a categorical heatmap generated using this interface in the gallery:

    http://docs.bokeh.org/en/latest/docs/gallery/categorical.html


    Regarding a legend, for a colormap like this you actually will want a discrete ColorBar instead of a Legend. This is a new feature that will be present in the upcoming 0.12.2 release later this week (today's date: 2016-08-28). These new colorbar annotations can be located outside the main plot area.

    There is also an example in the GitHub repo:

    https://github.com/bokeh/bokeh/blob/master/examples/plotting/file/color_data_map.py

    Note that last example also uses another new feature to do the colormapping in the browser, instead of having to precompute the colors in python. Basically all together it looks like:

    # create a color mapper with your palette - can be any list of colors
    mapper = LinearColorMapper(palette=Viridis3, low=0, high=100)
    
    p = figure(toolbar_location=None, tools='', title=title)
    p.circle(
        x='x', y='y', source=source
    
        # use the mapper to colormap according to the 'z' column (in the browser)
        fill_color={'field': 'z', 'transform': mapper},  
    )
    
    # create a ColorBar and addit to the side of the plot
    color_bar = ColorBar(color_mapper=mapper, location=(0, 0))
    p.add_layout(color_bar, 'right')
    

    There are more sophisticated options too, e.g. if you want to control the ticking on the colorbar more carefully you could add a custom ticker or tick formatter just like on a normal Axis, to achieve things like:

    It's not clear what your actual requirements are, so I just mention this in case it is useful to know.


    Finally, Bokeh is a large project and finding the best way to do so often involves asking for more information and context, and in general, having a discussion. That kind of collaborative help seems to be frowned upon at SO, (they are "not real answers") so I'd encourage you to also check out the project Discourse for help anytime.

    0 讨论(0)
提交回复
热议问题