catplot(kind=“count”) is significantly slower than countplot()

旧巷老猫 提交于 2021-01-29 04:35:00

问题


I am working on a fairly large dataset (~40m rows). I have found that if I call sns.countplot() directly then my visualisation plots really quickly:

%%time 
ax = sns.countplot(x="age_band",data=acme)

However if I do the same visualisation using catplot(kind="count") then the speed of execution slows down dramatically:

%%time
g = sns.catplot(x="age_band",data=acme,kind="count")

Is there a reason for such a large performance difference? Is catplot() doing some sort of conversion on my data before it can plot it?

If there is a known reason for this, then does it extend to all figure level functions vs axis level functions eg is sns.scatterplot() faster that sns.relplot(kind="scatter") etc?

My preference would be to use catplot() as I like its flexibility and easy plotting on a FacetGrid but if it is going to take so much longer to achieve the same plot then I will just use the axis level functions directly.


回答1:


There is a lot of overhead in catplot, or for that matter in FacetGrid, that will ensure that the categories are synchronized along the grid. Consider e.g. that you have a variable you plot along the columns of the grid for which not every age group occurs. You would still need to show that non-occuring age group and hold on to its color. Hence, two countplots next to each other do not necessarily make up one catplot.

However, if you are only interested in a single countplot, a catplot is clearly overkill. On the other hand, even a single countplot is overkill compared to a barplot of the counts. That is,

counts = df["Category"].value_counts().sort_index()
colors = plt.cm.tab10(np.arange(len(counts)))
ax = counts.plot.bar(color=colors)

will be twice as fast as

ax = sns.countplot(x="Category", data=df)


来源:https://stackoverflow.com/questions/57990852/catplotkind-count-is-significantly-slower-than-countplot

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!