How to plot parallel coordinates on pandas DataFrame with some columns containing strings?

后端 未结 2 1477
小鲜肉
小鲜肉 2021-01-13 01:49

I would like to plot parallel coordinates for a pandas DataFrame containing columns with numbers and other columns containing strings as values.

2条回答
  •  傲寒
    傲寒 (楼主)
    2021-01-13 02:53

    Based on @Diziet answer, to be able to get the desired graph under Python 2.5 we can use following code:

    import pandas as pd
    import matplotlib.pyplot as plt
    from pandas.tools.plotting import parallel_coordinates
    
    def format(input):
        if input == "N":
            output = 0
        elif input == "N-1":
            output = 1
        else:
            output = None
        return output
    
    df2 = pd.DataFrame([["line 1",20,30,100,"N"],\
        ["line 2",10,40,90,"N"],["line 3",10,35,120,"N-1"]],\
        columns=["element","var 1","var 2","var 3","regime"])
    df2["regime_encoded"] = df2["regime"].apply(format) * max(df2[["var 1","var 2","var 3"]].max(axis=1))
    
    parallel_coordinates(df2[['element', 'var 1', 'var 2', 'var 3', 'regime_encoded']],"element")
    ax = plt.gca()
    for i,(label,val) in df2.ix[:,['regime','regime_encoded']].drop_duplicates().iterrows():
        ax.annotate(label, xy=(3,val), ha='left', va='center')
    
    plt.show()
    

    This will end up showing following graph:

提交回复
热议问题