Adding a scatter of points to a boxplot using matplotlib

匿名 (未验证) 提交于 2019-12-03 01:23:02

问题:

I have seen this wonderful boxplot in this article (Fig.2).

As you can see, this is a boxplot on which are superimposed a scatter of black points: x indexes the black points (in a random order), y is the variable of interest. I would like to do something similar using Matplotlib, but I have no idea where to start. So far, the boxplots which I have found online are way less cool and look like this:

Documentation of matplotlib: http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.boxplot

Ways to colorize boxplots: https://github.com/jbmouret/matplotlib_for_papers#colored-boxes

回答1:

What you're looking for is a way to add jitter to the x-axis.

Something like this taken from here:

bp = titanic.boxplot(column='age', by='pclass', grid=False) for i in [1,2,3]:     y = titanic.age[titanic.pclass==i].dropna()     # Add some random "jitter" to the x-axis     x = np.random.normal(i, 0.04, size=len(y))     plot(x, y, 'r.', alpha=0.2) 

Quoting the link:

One way to add additional information to a boxplot is to overlay the actual data; this is generally most suitable with small- or moderate-sized data series. When data are dense, a couple of tricks used above help the visualization:

  1. reducing the alpha level to make the points partially transparent
  2. adding random "jitter" along the x-axis to avoid overstriking

The code looks like this:

import pylab as P import numpy as np  # Define data # Define numBoxes  P.figure()  bp = P.boxplot(data)  for i in range(numBoxes):     y = data[i]     x = np.random.normal(1+i, 0.04, size=len(y))     P.plot(x, y, 'r.', alpha=0.2)  P.show() 


回答2:

Expanding on Kyrubas's solution and using only matplotlib for the plotting part (sometimes I have difficulty formatting pandas plots with matplotlib).

from matplotlib import cm import matplotlib.pyplot as plt import pandas as pd import numpy as np  # initialize dataframe n = 200 ngroup = 3 df = pd.DataFrame({'data': np.random.rand(n), 'group': map(np.floor, np.random.rand(n) * ngroup)})  group = 'group' column = 'data' grouped = df.groupby(group)  names, vals, xs = [], [] ,[]  for i, (name, subdf) in enumerate(grouped):     names.append(name)     vals.append(subdf[column].tolist())     xs.append(np.random.normal(i+1, 0.04, subdf.shape[0]))  plt.boxplot(vals, labels=names) ngroup = len(vals) clevels = np.linspace(0., 1., ngroup)  for x, val, clevel in zip(xs, vals, clevels):     plt.scatter(x, val, c=cm.prism(clevel), alpha=0.4) 



易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!