How can I make a correlation heatmap in Bokeh?
import pandas as pd
import bokeh.charts
df = pd.util.testing.makeTimeDataFrame(1000)
c = df.corr()
p = bokeh.cha
I tried to create an interactive correlation plot using the Bokeh library. The code is the combination of different solutions available on SO and other websites. In above solution bigreddot has explained things in details. The code for correlation heatmap as below:
import pandas as pd
from bokeh.io import output_file, show
from bokeh.models import BasicTicker, ColorBar, LinearColorMapper, ColumnDataSource, PrintfTickFormatter
from bokeh.plotting import figure
from bokeh.transform import transform
from bokeh.palettes import Viridis3, Viridis256
# Read your data in pandas dataframe
data = pd.read_csv(%%%%%Your Path%%%%%)
#Now we will create correlation matrix using pandas
df = data.corr()
df.index.name = 'AllColumns1'
df.columns.name = 'AllColumns2'
# Prepare data.frame in the right format
df = df.stack().rename("value").reset_index()
# here the plot :
output_file("CorrelationPlot.html")
# You can use your own palette here
# colors = ['#d7191c', '#fdae61', '#ffffbf', '#a6d96a', '#1a9641']
# I am using 'Viridis256' to map colors with value, change it with 'colors' if you need some specific colors
mapper = LinearColorMapper(
palette=Viridis256, low=df.value.min(), high=df.value.max())
# Define a figure and tools
TOOLS = "box_select,lasso_select,pan,wheel_zoom,box_zoom,reset,help"
p = figure(
tools=TOOLS,
plot_width=1200,
plot_height=1000,
title="Correlation plot",
x_range=list(df.AllColumns1.drop_duplicates()),
y_range=list(df.AllColumns2.drop_duplicates()),
toolbar_location="right",
x_axis_location="below")
# Create rectangle for heatmap
p.rect(
x="AllColumns1",
y="AllColumns2",
width=1,
height=1,
source=ColumnDataSource(df),
line_color=None,
fill_color=transform('value', mapper))
# Add legend
color_bar = ColorBar(
color_mapper=mapper,
location=(0, 0),
ticker=BasicTicker(desired_num_ticks=10))
p.add_layout(color_bar, 'right')
show(p)
References:
[1] https://docs.bokeh.org/en/latest/docs/user_guide.html
[2] Bokeh heatmap from Pandas confusion matrix
So I think I can provide a baseline code to help do what you are asking using a combination of the answers above and some extra pre-processing.
Let's assume you have a dataframe df already loaded (in this case the UCI Adult Data) and the correlation coefficients calculated (p_corr).
import bisect
#
from math import pi
from numpy import arange
from itertools import chain
from collections import OrderedDict
#
from bokeh.palettes import RdBu as colors # just make sure to import a palette that centers on white (-ish)
from bokeh.models import ColorBar, LinearColorMapper
colors = list(reversed(colors[9])) # we want an odd number to ensure 0 correlation is a distinct color
labels = df.columns
nlabels = len(labels)
def get_bounds(n):
"""Gets bounds for quads with n features"""
bottom = list(chain.from_iterable([[ii]*nlabels for ii in range(nlabels)]))
top = list(chain.from_iterable([[ii+1]*nlabels for ii in range(nlabels)]))
left = list(chain.from_iterable([list(range(nlabels)) for ii in range(nlabels)]))
right = list(chain.from_iterable([list(range(1,nlabels+1)) for ii in range(nlabels)]))
return top, bottom, left, right
def get_colors(corr_array, colors):
"""Aligns color values from palette with the correlation coefficient values"""
ccorr = arange(-1, 1, 1/(len(colors)/2))
color = []
for value in corr_array:
ind = bisect.bisect_left(ccorr, value)
color.append(colors[ind-1])
return color
p = figure(plot_width=600, plot_height=600,
x_range=(0,nlabels), y_range=(0,nlabels),
title="Correlation Coefficient Heatmap (lighter is worse)",
toolbar_location=None, tools='')
p.xgrid.grid_line_color = None
p.ygrid.grid_line_color = None
p.xaxis.major_label_orientation = pi/4
p.yaxis.major_label_orientation = pi/4
top, bottom, left, right = get_bounds(nlabels) # creates sqaures for plot
color_list = get_colors(p_corr.values.flatten(), colors)
p.quad(top=top, bottom=bottom, left=left,
right=right, line_color='white',
color=color_list)
# Set ticks with labels
ticks = [tick+0.5 for tick in list(range(nlabels))]
tick_dict = OrderedDict([[tick, labels[ii]] for ii, tick in enumerate(ticks)])
# Create the correct number of ticks for each axis
p.xaxis.ticker = ticks
p.yaxis.ticker = ticks
# Override the labels
p.xaxis.major_label_overrides = tick_dict
p.yaxis.major_label_overrides = tick_dict
# Setup color bar
mapper = LinearColorMapper(palette=colors, low=-1, high=1)
color_bar = ColorBar(color_mapper=mapper, location=(0, 0))
p.add_layout(color_bar, 'right')
show(p)
This will result in the following plot if the categories are integer encoded (this is a horrible data example):
In modern Bokeh you should use the bokeh.plotting interface. You can see an example of a categorical heatmap generated using this interface in the gallery:
http://docs.bokeh.org/en/latest/docs/gallery/categorical.html
Regarding a legend, for a colormap like this you actually will want a discrete ColorBar
instead of a Legend
. This is a new feature that will be present in the upcoming 0.12.2
release later this week (today's date: 2016-08-28). These new colorbar annotations can be located outside the main plot area.
There is also an example in the GitHub repo:
https://github.com/bokeh/bokeh/blob/master/examples/plotting/file/color_data_map.py
Note that last example also uses another new feature to do the colormapping in the browser, instead of having to precompute the colors in python. Basically all together it looks like:
# create a color mapper with your palette - can be any list of colors
mapper = LinearColorMapper(palette=Viridis3, low=0, high=100)
p = figure(toolbar_location=None, tools='', title=title)
p.circle(
x='x', y='y', source=source
# use the mapper to colormap according to the 'z' column (in the browser)
fill_color={'field': 'z', 'transform': mapper},
)
# create a ColorBar and addit to the side of the plot
color_bar = ColorBar(color_mapper=mapper, location=(0, 0))
p.add_layout(color_bar, 'right')
There are more sophisticated options too, e.g. if you want to control the ticking on the colorbar more carefully you could add a custom ticker or tick formatter just like on a normal Axis
, to achieve things like:
It's not clear what your actual requirements are, so I just mention this in case it is useful to know.
Finally, Bokeh is a large project and finding the best way to do so often involves asking for more information and context, and in general, having a discussion. That kind of collaborative help seems to be frowned upon at SO, (they are "not real answers") so I'd encourage you to also check out the project Discourse for help anytime.