Is it possible to get plot from panda dataframe includes missing data by Heatmap with especial color?

不想你离开。 提交于 2019-12-11 05:58:54

问题


I was wondering if I can get all plots of columns in panda dataframe in one-window via heatmap in 24x20 self-made matrix-model-square which I designed to map every 480 values of each column(which means 1-cycle) by mapping them inside of it through all cycles. The challenging point is I want to show missing data by using especial color which is out of color range of colormap cmap ='coolwarm'

I already tried by using df = df.replace([np.inf, -np.inf], np.nan) make sure that all inf convert to nan and then by using df = df.replace(0,np.nan) before sns.heatmap(df, vmin=-1, vmax=+1, cmap ='coolwarm' I can recognize missing values via white color since in cmap ='coolwarm' white color represents nan/inf in this interval [vmin=-1, vmax=+1] after applying above-mentioned instructions however it has 2 problem:

First in case that you have 0 in your dataset it will be shown like missing data by white color too and you can't distinguish between inf/nan and 0 in columns. Second problem is you can't even differentiate between nan and inf values!

I also tried mask=df.isnull() inside sns.heatmap() by specifying a mask, where data will not be shown for those cells whose mask values are True but it covers again 0 based on this answer GH375. I'm not sure the answer here mentioned by @Scotty1- is right solution for my case by adding marker to interpolate the values by newdf = newdf.interpolate(). Is it good idea to filter missing data by subsetting :

import math
df = df[df['a'].apply(lambda x: math.isnan(x))]
df = df[df['a'] == float('inf')]

My scripts are following however in for-loop I couldn't get proper output due to in each cycle it prints plot each of them 3 times in different intervals eg. it prints A left then again it prints A under the name of B and C in middle and right in-one-window. Again it prints B 3-times instead once and put it middle and in the end it prints C 3-times instead of once and put in right side it put in middle and left!

import numpy as np
import pandas as pd
import os
import seaborn as sns
import matplotlib.pyplot as plt

#extract the parameters and put them in lists based on id_set
df = pd.read_csv('D:\SOF.TXT', header=None)
id_set = df[df.index % 4 == 0].astype('int').values
a = df[df.index % 4 == 1].values
b = df[df.index % 4 == 2].values
c = df[df.index % 4 == 3].values
data = {'A': a[:,0], 'B': b[:,0], 'C': c[:,0] }
#main_data contains all the data
main_data = pd.DataFrame(data, columns=['A','B','C'], index = id_set[:,0])  



#next iteration create all plots, change the numer of cycles
cycles = int(len(main_data)/480)
print(cycles)
for i in main_data:
    try:
        os.mkdir(i)
    except:
        pass
    min_val = main_data[i].min()
    min_nor = -1
    max_val = main_data[i].max()
    max_nor = 1
    for cycle in range(1):             #iterate thriugh all cycles range(1) by ====> range(int(len(main_data)/480))
        count =  '{:04}'.format(cycle)
        j = cycle * 480
        ordered_data = mkdf(main_data.iloc[j:j+480][i])
        csv = print_df(ordered_data)
        #Print .csv files contains matrix of each parameters by name of cycles respectively
        csv.to_csv(f'{i}/{i}{count}.csv', header=None, index=None)            
        if 'C' in i:
            min_nor = -40
            max_nor = 150
            #Applying normalizayion for C between [-40,+150]
            new_value = normalize(main_data.iloc[j:j+480][i].values, min_val, max_val, -40, 150)
            n_cbar_kws = {"ticks":[-40,150,-20,0,25,50,75,100,125]}
        else:
            #Applying normalizayion for A,B between    [-1,+1]
            new_value = normalize(main_data.iloc[j:j+480][i].values, min_val, max_val, -1, 1)
            n_cbar_kws = {"ticks":[-1.0,-0.75,-0.50,-0.25,0.00,0.25,0.50,0.75,1.0]}    
        Sections = mkdf(new_value)
        df = print_df(Sections)
        #Plotting parameters by using HeatMap
        plt.figure()
        sns.heatmap(df, vmin=min_nor, vmax=max_nor, cmap ='coolwarm', cbar_kws=n_cbar_kws)                             
        plt.title(i, fontsize=12, color='black', loc='left', style='italic')
        plt.axis('off')
        #Print .PNG iamges contains HeatMap plots of each parametersby name of cycles respectively
        plt.savefig(f'{i}/{i}{count}.png')  



    #plotting all columns ['A','B','C'] in-one-window side by side


    fig, axes = plt.subplots(nrows=1, ncols=3 , figsize=(20,10))
    plt.subplot(131)
    sns.heatmap(df, vmin=-1, vmax=1, cmap ="coolwarm", cbar=True , cbar_kws={"ticks":[-1.0,-0.75,-0.5,-0.25,0.00,0.25,0.5,0.75,1.0]})
    fig.axes[-1].set_ylabel('[MPa]', size=20) #cbar_kws={'label': 'Celsius'}
    plt.title('A', fontsize=12, color='black', loc='left', style='italic')
    plt.axis('off')

    plt.subplot(132)
    sns.heatmap(df, vmin=-1, vmax=1, cmap ="coolwarm", cbar=True , cbar_kws={"ticks":[-1.0,-0.75,-0.5,-0.25,0.00,0.25,0.5,0.75,1.0]})
    fig.axes[-1].set_ylabel('[Mpa]', size=20) #cbar_kws={'label': 'Celsius'}
    #sns.despine(left=True)
    plt.title('B', fontsize=12, color='black', loc='left', style='italic')
    plt.axis('off')

    plt.subplot(133)
    sns.heatmap(df, vmin=-40, vmax=150, cmap ="coolwarm" , cbar=True , cbar_kws={"ticks":[-40,150,-20,0,25,50,75,100,125]}) 
    fig.axes[-1].set_ylabel('[°C]', size=20) #cbar_kws={'label': 'Celsius'}
    #sns.despine(left=True)
    plt.title('C', fontsize=12, color='black', loc='left', style='italic')
    plt.axis('off')


    plt.suptitle(f'Analysis of data in cycle Nr.: {count}', color='yellow', backgroundcolor='black', fontsize=48, fontweight='bold')
    plt.subplots_adjust(top=0.7, bottom=0.3, left=0.05, right=0.95, hspace=0.2, wspace=0.2)
    #plt.subplot_tool()
    plt.savefig(f'{i}/{i}{i}{count}.png') 
    plt.show()

my data frame looks like following:

          A          B            C
0      2.291171  -2.689658  -344.047912
10     2.176816  -4.381186  -335.936524
20     2.291171  -2.589725  -342.544885
30     2.176597  -6.360999     0.000000
40     2.577268  -1.993412  -344.326376
50     9.844076  -2.690917  -346.125859
60     2.061782  -2.889378  -346.375655

Here below is overview of my dataset sample from .TXT file: dataset
in case that you want to check out with missing data values please change the last 3 values of end of text file to nan/inf and save it and debug it.

7590                  7590
0                     nan
7.19025828418         nan
-1738.000075          inf

I'd like to visualise a large pandas-dataframe includes 3 columns columns=['A','B','C'] via heatmaps in-one-window. This dataframe has two types of variables: strings (nan or inf) and floats. I want the heatmap to show missing data cells inside of matrix-squared-model by fixed colors like nan by black and inf by silver or gray, and the rest of the dataframe as a normal heatmap, with the floats in a scale of cmap ='coolwarm'.

Here is image of desired output when there is no nan/inf in dataset:

I'm looking forward to hearing from those people they are dealing with these issues.

来源:https://stackoverflow.com/questions/54166009/is-it-possible-to-get-plot-from-panda-dataframe-includes-missing-data-by-heatmap

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!