Speeding up matplotlib scatter plots

I'm trying to make an interactive program which primarily uses matplotlib to make scatter plots of rather a lot of points (10k-100k or so). Right now it works, but changes take too long to render. Small numbers of points are ok, but once the number rises things get frustrating in a hurry. So, I'm working on ways to speed up scatter, but I'm not having much luck

There's the obvious way to do thing (the way it's implemented now) (I realize the plot redraws without updating. I didn't want to alter the fps result with large calls to random).

import matplotlib.pyplot as plt
import numpy as np
import matplotlib as mpl
import time


X = np.random.randn(10000)  #x pos
Y = np.random.randn(10000)  #y pos
C = np.random.random(10000) #will be color
S = (1+np.random.randn(10000)**2)*3 #size

#build the colors from a color map
colors = mpl.cm.jet(C)
#there are easier ways to do static alpha, but this allows 
#per point alpha later on.
colors[:,3] = 0.1

fig, ax = plt.subplots()

fig.show()
background = fig.canvas.copy_from_bbox(ax.bbox)

#this makes the base collection
coll = ax.scatter(X,Y,facecolor=colors, s=S, edgecolor='None',marker='D')

fig.canvas.draw()

sTime = time.time()
for i in range(10):
    print i
    #don't change anything, but redraw the plot
    ax.cla()
    coll = ax.scatter(X,Y,facecolor=colors, s=S, edgecolor='None',marker='D')
    fig.canvas.draw()
print '%2.1f FPS'%( (time.time()-sTime)/10 )

Which gives a speedy 0.7 fps

Alternatively, I can edit the collection returned by scatter. For that, I can change color and position, but don't know how to change the size of each point. That would I think look something like this

import matplotlib.pyplot as plt
import numpy as np
import matplotlib as mpl
import time


X = np.random.randn(10000)  #x pos
Y = np.random.randn(10000)  #y pos
C = np.random.random(10000) #will be color
S = (1+np.random.randn(10000)**2)*3 #size

#build the colors from a color map
colors = mpl.cm.jet(C)
#there are easier ways to do static alpha, but this allows 
#per point alpha later on.
colors[:,3] = 0.1

fig, ax = plt.subplots()

fig.show()
background = fig.canvas.copy_from_bbox(ax.bbox)

#this makes the base collection
coll = ax.scatter(X,Y,facecolor=colors, s=S, edgecolor='None', marker='D')

fig.canvas.draw()

sTime = time.time()
for i in range(10):
    print i
    #don't change anything, but redraw the plot
    coll.set_facecolors(colors)
    coll.set_offsets( np.array([X,Y]).T )
    #for starters lets not change anything!
    fig.canvas.restore_region(background)
    ax.draw_artist(coll)
    fig.canvas.blit(ax.bbox)
print '%2.1f FPS'%( (time.time()-sTime)/10 )

This results in a slower 0.7 fps. I wanted to try using CircleCollection or RegularPolygonCollection, as this would allow me to change the sizes easily, and I don't care about changing the marker. But, I can't get either to draw so I have no idea if they'd be faster. So, at this point I'm looking for ideas.

I've been through this a few times trying to speed up scatter plots with large numbers of points, variously trying:

Different marker types
Limiting colours
Cutting down the dataset
Using a heatmap / grid instead of a scatter plot

And none of these things worked. Matplotlib is just not very performant when it comes to scatter plots. My only recommendation is to use a different plotting library, though I haven't personally found one that was suitable. I know this doesn't help much, but it may save you some hours of fruitless tinkering.

We are actively working on performance for large matplotlib scatter plots. I'd encourage you to get involved in the conversation (http://matplotlib.1069221.n5.nabble.com/mpl-1-2-1-Speedup-code-by-removing-startswith-calls-and-some-for-loops-td41767.html) and, even better, test out the pull request that has been submitted to make life much better for a similar case (https://github.com/matplotlib/matplotlib/pull/2156).

HTH

来源：https://stackoverflow.com/questions/18179928/speeding-up-matplotlib-scatter-plots

标签

python

matplotlib

scatter