I would like to plot parallel coordinates for a pandas
DataFrame containing columns with numbers and other columns containing strings as values.
Based on @Diziet answer, to be able to get the desired graph under Python 2.5 we can use following code:
import pandas as pd
import matplotlib.pyplot as plt
from pandas.tools.plotting import parallel_coordinates
def format(input):
if input == "N":
output = 0
elif input == "N-1":
output = 1
else:
output = None
return output
df2 = pd.DataFrame([["line 1",20,30,100,"N"],\
["line 2",10,40,90,"N"],["line 3",10,35,120,"N-1"]],\
columns=["element","var 1","var 2","var 3","regime"])
df2["regime_encoded"] = df2["regime"].apply(format) * max(df2[["var 1","var 2","var 3"]].max(axis=1))
parallel_coordinates(df2[['element', 'var 1', 'var 2', 'var 3', 'regime_encoded']],"element")
ax = plt.gca()
for i,(label,val) in df2.ix[:,['regime','regime_encoded']].drop_duplicates().iterrows():
ax.annotate(label, xy=(3,val), ha='left', va='center')
plt.show()
This will end up showing following graph: