Timeserie datetick problems when using pandas.DataFrame.plot method

折月煮酒 提交于 2021-02-07 09:31:34

问题


I just discovered something really strange when using plot method of pandas.DataFrame. I am using pandas 0.19.1. Here is my MWE:

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd

t = pd.date_range('1990-01-01', '1990-01-08', freq='1H')
x = pd.DataFrame(np.random.rand(len(t)), index=t)

fig, axe = plt.subplots()
x.plot(ax=axe)
plt.show(axe)

xt = axe.get_xticks()

When I try to format my xticklabels I get strange beahviours, then I insepcted objects to understand and I have found the following:

  • t[-1] - t[0] = Timedelta('7 days 00:00:00'), confirming the DateTimeIndex is what I expect;
  • xt = [175320, 175488], xticks are integers but they are not equals to a number of days since epoch (I do not have any idea about what it is);
  • xt[-1] - xt[0] = 168 there are more like index, there is the same amount that len(x) = 169.

This explains why I cannot succed to format my axe using:

axe.xaxis.set_major_locator(mdates.HourLocator(byhour=(0,6,12,18)))
axe.xaxis.set_major_formatter(mdates.DateFormatter("%a %H:%M"))

The first raise an error that there is to many ticks to generate The second show that my first tick is Fri 00:00 but it should be Mon 00:00 (in fact matplotlib assumes the first tick to be 0481-01-03 00:00, oops this is where my bug is).

It looks like there is some incompatibility between pandas and matplotlib integer to date conversion but I cannot find out how to fix this issue.

If I run instead:

fig, axe = plt.subplots()
axe.plot(x)
axe.xaxis.set_major_formatter(mdates.DateFormatter("%a %H:%M"))
plt.show(axe)

xt = axe.get_xticks()

Everything works as expected but I miss all cool features from pandas.DataFrame.plot method such as curve labeling, etc. And here xt = [726468. 726475.].

How can I properly format my ticks using pandas.DataFrame.plot method instead of axe.plot and avoiding this issue?

Update

The problem seems to be about origin and scale (units) of underlying numbers for date representation. Anyway I cannot control it, even by forcing it to the correct type:

t = pd.date_range('1990-01-01', '1990-01-08', freq='1H', origin='unix', units='D')

There is a discrepancy between matplotlib and pandas representation. And I could not find any documentation of this problem.


回答1:


Is this what you are going for? Note I shortened the date_range to make it easier to see the labels.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt 
import matplotlib.dates as dates

t = pd.date_range('1990-01-01', '1990-01-04', freq='1H')
x = pd.DataFrame(np.random.rand(len(t)), index=t)

# resample the df to get the index at 6-hour intervals
l = x.resample('6H').first().index

# set the ticks when you plot. this appears to position them, but not set the label
ax = x.plot(xticks=l)

# set the display value of the tick labels
ax.set_xticklabels(l.strftime("%a %H:%M"))
# hide the labels from the initial pandas plot
ax.set_xticklabels([], minor=True)
# make pretty
ax.get_figure().autofmt_xdate()

plt.show()



来源:https://stackoverflow.com/questions/43891136/timeserie-datetick-problems-when-using-pandas-dataframe-plot-method

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!