How to plot kernel density plot of dates in Pandas?

末鹿安然 提交于 2019-12-03 06:27:39
Jianxun Li

Inspired by @JohnE 's answer, an alternative approach to convert date to numeric value is to use .toordinal().

import pandas as pd
import numpy as np

# simulate some artificial data
# ===============================
np.random.seed(0)
dates = pd.date_range('2010-01-01', periods=31, freq='D')
df = pd.DataFrame(np.random.choice(dates,100), columns=['dates'])
# use toordinal() to get datenum
df['ordinal'] = [x.toordinal() for x in df.dates]

print(df)

        dates  ordinal
0  2010-01-13   733785
1  2010-01-16   733788
2  2010-01-22   733794
3  2010-01-01   733773
4  2010-01-04   733776
5  2010-01-28   733800
6  2010-01-04   733776
7  2010-01-08   733780
8  2010-01-10   733782
9  2010-01-20   733792
..        ...      ...
90 2010-01-19   733791
91 2010-01-28   733800
92 2010-01-01   733773
93 2010-01-15   733787
94 2010-01-04   733776
95 2010-01-22   733794
96 2010-01-13   733785
97 2010-01-26   733798
98 2010-01-11   733783
99 2010-01-21   733793

[100 rows x 2 columns]    

# plot non-parametric kde on numeric datenum
ax = df['ordinal'].plot(kind='kde')
# rename the xticks with labels
x_ticks = ax.get_xticks()
ax.set_xticks(x_ticks[::2])
xlabels = [datetime.datetime.fromordinal(int(x)).strftime('%Y-%m-%d') for x in x_ticks[::2]]
ax.set_xticklabels(xlabels)

I imagine there is some better and automatic way to do this, but if not then this ought to be a decent workaround. First, let's set up some sample data:

np.random.seed(479)
start_date = '2011-1-1'
df = pd.DataFrame({ 'date':np.random.choice( 
                    pd.date_range(start_date, periods=365*5, freq='D'), 50) })

df['rel'] = df['date'] - pd.to_datetime(start_date)
df.rel = df.rel.astype('timedelta64[D]')

        date   rel
0 2014-06-06  1252
1 2011-10-26   298
2 2013-08-24   966
3 2014-09-25  1363
4 2011-12-23   356

As you can see, 'rel' is just the number of days since the starting day. It's essentially an integer, so all you really need to do is normalize it with respect to the starting date.

df['year_as_float'] = pd.to_datetime(start_date).year + df.rel / 365.

        date   rel  year_as_float
0 2014-06-06  1252    2014.430137
1 2011-10-26   298    2011.816438
2 2013-08-24   966    2013.646575
3 2014-09-25  1363    2014.734247
4 2011-12-23   356    2011.975342

You'd need to adjust that slightly for a date not starting on Jan 1. That's also ignoring any leap years which really isn't a practical issue if you're just producing a KDE plot over 5 years, but it could matter depending on what else you might want to do.

Here's the plot

df['year_as_float']d.plot(kind='kde')

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!