Holoviews tap stream of correlation heatmap and regression plot

后端 未结 2 1320
无人及你
无人及你 2021-01-24 11:34

I want to make a correlation heatmap for a DataFrame and a regression plot for each pair of the variables. I have tried to read all the docs and am still having a very hard time

相关标签:
2条回答
  • 2021-01-24 12:09

    Two common problems we face while modeling is collinearity and nonlinearity. The collinearity could be visualized with a correlation heatmap, but it would become hard to explore with a large amount of variables/features. In the following application, you can hover the mouse over to check the correlation coefficient between any two variables. When you tap, the scatter plot will be updated with a second-degree fitted curve to reveal the nonlinearity between the two variables.

    With the help of @doopler, I changed the code a little bit and share it here:

    import numpy as np
    import pandas as pd
    import holoviews as hv
    hv.extension('bokeh')
    
    # generate random data
    df = pd.DataFrame(data={'col_1': np.random.normal(5, 2, 100)})
    
    df['col_2'] = df.col_1 + np.random.gamma(5, 2, 100)
    df['col_3'] = df.col_1*2 + np.random.normal(0, 10, 100)
    df['col_4'] = df.col_1**2 + np.random.normal(0, 10, 100)
    df['col_5'] = np.sin(df.col_1)
    df['col_6'] = np.cos(df.col_1)
    
    corr = df.corr().abs()
    # mask the upper triangle of the heatmap
    corr.values[np.triu_indices_from(corr, 0)] = np.nan
    
    heatmap = hv.HeatMap((corr.columns, corr.index, corr))\
                .opts(tools=['hover'],  height=400, width=400, fontsize=9,
                      toolbar='above', colorbar=False, cmap='Blues',
                      invert_yaxis=True, xrotation=90, xlabel='', ylabel='',
                      title='Correlation Coefficient Heatmap (absolute value)')
    
    # define tap stream with heatmap as source
    tap_xy = hv.streams.Tap(source=heatmap, x='col_1', y='col_4')
    
    # calculate correlation plot based on tap
    def tap_corrplot(x, y):
        # drop missing values if there are any
        df_notnull = df[[x, y]].dropna(how='any')
    
        # fit a 2nd degree line/curve
        m1, m2, b = np.polyfit(df_notnull[x], df_notnull[y], deg=2)
        # generate data to plot fitted line/curve
        x_curve = np.linspace(df[x].min(), df[x].max())
        y_curve = m1*x_curve**2 + m2*x_curve+ b
    
        curve = hv.Curve((x_curve, y_curve), x, y)\
                  .opts(color='#fc4f30', framewise=True)
    
        scatter = hv.Scatter((df[x], df[y]), x, y)\
                    .opts(height=400, width=400, fontsize=9, size=5,
                          alpha=0.2, ylim=(df[y].min(), df[y].max()),
                          color='#30a2da', framewise=True,
                          title='Correlation Plot (2nd degree fit)')
    
        return curve * scatter
    
    # map tap in heatmap with correlation plot
    tap_dmap = hv.DynamicMap(tap_corrplot, streams=[tap_xy])
    
    layout = heatmap + tap_dmap
    layout
    
    

    In case that you need to run a Bokeh application:

    from bokeh.server.server import Server
    
    renderer = hv.renderer('bokeh')
    app = renderer.app(layout)
    server = Server({'/': app}, port=0)
    
    server.start()
    server.show('/')
    

    The code works well with Jupyter Lab. If you use Jupyter Notebook, check this link.

    0 讨论(0)
  • 2021-01-24 12:23

    I adjusted the relevant parts of the docs http://holoviews.org/reference/streams/bokeh/Tap.html with your code. Maybe this clears up your confusion.

    import pandas as pd
    import numpy as np
    import holoviews as hv
    from holoviews import opts
    hv.extension('bokeh', width=90)
    
    import seaborn as sns
    
    # Declare dataset
    df = sns.load_dataset('tips')
    df = df[['total_bill', 'tip', 'size']]
    
    # Declare HeatMap
    corr = df.corr()
    heatmap = hv.HeatMap((corr.columns, corr.index, corr))
    
    # Declare Tap stream with heatmap as source and initial values
    posxy = hv.streams.Tap(source=heatmap, x='total_bill', y='tip')
    
    # Define function to compute histogram based on tap location
    def tap_histogram(x, y):
        m, b = np.polyfit(df[x], df[y], deg=1)
        x_data = np.linspace(df.tip.min(), df.tip.max())
        y_data = m*x_data + b
        return hv.Curve((x_data, y_data), x, y) * hv.Scatter((df[x], df[y]), x, y)
    
    
    tap_dmap = hv.DynamicMap(tap_histogram, streams=[posxy])
    
    (heatmap + tap_dmap).opts(
        opts.Scatter(height=400, width=400, color='red', ylim=(0, 100), framewise=True),
        opts.HeatMap(tools=['tap', 'hover'], height=400, width=400, toolbar='above'),
        opts.Curve(framewise=True)
    )
    
    0 讨论(0)
提交回复
热议问题