Multiple inputs multivariate data visualisation

淺唱寂寞╮ 提交于 2019-12-02 08:36:54

UPDATE:

with different colors:

colors = dict(low='DarkBlue', high='red', part='yellow', medium='DarkGreen')

fig, ax = plt.subplots()

for grp, vals in df.groupby('col4'):
    color = colors[grp]
    vals[['col2','col3']].plot.scatter(x='col2', y='col3', ax=ax,
                                       s=120, label=grp, color=color)

PS you will have to care that all your groups (col4) - are defined in colors dictionary

OLD answer:

assuming that you've concatenated/merged/joined your files into single DF, we can do the following:

fig, ax = plt.subplots()
[vals[['col2','col3']].plot.scatter(x='col2', y='col3', ax=ax, label=grp)
 for grp, vals in df.groupby('col4')]

PS as a homework - you can play with colors ;)

Consider plotting a pivot_table of a pandas df which concatenates the many .txt files. Below runs two types of pivots with Type grouping and Class2 grouping. Gaps are due to NaN in pivoted data:

import pandas as pd
import numpy as np
from matplotlib import rc, pyplot as plt
import seaborn

# IMPORT .TXT DATA
df = pd.concat([pd.read_table('TweetCricScore1.txt', header=None, sep='\\s+'),
                pd.read_table('TweetCricScore2.txt', header=None, sep='\\s+'),
                pd.read_table('TweetCricScore3.txt', header=None, sep='\\s+'),
                pd.read_table('TweetCricScore4.txt', header=None, sep='\\s+')])    
df.columns = ['Class1', 'Class2', 'Score', 'Type']

# PLOT SETTINGS
font = {'family' : 'arial', 'weight' : 'bold', 'size'   : 10}    
rc('font', **font); rc("figure", facecolor="white"); rc('axes', edgecolor='darkgray')

seaborn.set()      # FOR MODERN COLOR DESIGN

def runplot(pvtdf):
    pvtdf.plot(kind='bar', edgecolor='w',figsize=(10,5), width=0.9, fontsize = 10)    
    locs, labels = plt.xticks()
    plt.title('Tweet Cric Score', weight='bold', size=14)
    plt.legend(loc=1, prop={'size':10}, shadow=True)
    plt.xlabel('Classification', weight='bold', size=12)
    plt.ylabel('Score', weight='bold', size=12)
    plt.tick_params(axis='x', bottom='off', top='off')
    plt.tick_params(axis='y', left='off', right='off')
    plt.ylim([0,100])
    plt.grid(b=False)
    plt.setp(labels, rotation=45, rotation_mode="anchor", ha="right")
    plt.tight_layout()

# PIVOT DATA
sumtable = df.pivot_table(values='Score', index=['Class2'],
                          columns=['Type'], aggfunc=sum)
runplot(sumtable)
sumtable = df.pivot_table(values='Score', index=['Type'],
                          columns=['Class2'], aggfunc=sum)
runplot(sumtable)

So first off, in your plotting code. There are a couple errors and one looks like just a typo based on the error you included. After changing the column names you call plt.df1(...) This should be plt.scatter(...) and it looks like from the error you included that is what you actually called. The problem that your error is alerting you to is that you are trying to call x='col2' with 'col2' being the value matplotlib wants to plot. I realize you are trying to feed in 'col2' from df1 but unfortunately that is not what you did. In order to do that you just need to call plt.scatter(df1.col2, df1.col3, ...) where df1.col2 and df1.col3 are series representing your x and y values respectively. Fixing this will give you the following output (I used input4 as it has the most data points):

As far as plotting several categories onto one chart you have several options. You could change the plotting code to something like:

fig, ax = plt.subplots()
ax.plot(df1.col2, df1.col3, 'bo', label='Highly')
ax.plot(df2.col2, df2.col2, 'go', label='Moderately')
ax.legend()
ax.xlabel('Freq (x)')
ax.ylabel('Freq(y)')
plt.show()

However this is rather clunky. Better would be to have all of the data in one dataframe and add a column titled label that takes the label value you want based on how you categorize the data. That way you could then use something like:

fig, ax = plt.subplots()
for group, name in df.groupby('label'):
    ax.plot(group.x, group.y, marker='o', label=name)
ax.legend()
plt.show()

While Trying with @MaxU's solution and his solution is the great but somehow I had few error and in process to patch the errors. I came across this alternative Boken which looks similar to Seaborn I am sharing the code just as an alternative for some beginner's reference.

Code:

import numpy as np
import matplotlib.pyplot as plt
from pylab import*
import math
from matplotlib.ticker import LogLocator
import pandas as pd
from bokeh.charts import Scatter, output_file, show

df = pd.read_csv('input.csv', header = None)

df.columns = ['col1','col2','col3','col4']

scatter = Scatter( df, x='col2', y='col3', color='col4', marker='col4', title='plot', legend=True)

output_file('output.html', title='output')

show(scatter)

Output:

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!