Inconsistent internal representation of dates in matplotlib/pandas

守給你的承諾、 提交于 2019-12-11 17:28:24

问题


import pandas as pd

index = pd.to_datetime(['2016-05-01', '2016-11-01', '2017-05-02'])
data = pd.DataFrame({'a': [1, 2, 3],
                     'b': [4, 5, 6]}, index=index)
ax = data.plot()
print(ax.get_xlim())

# Out: (736066.7, 736469.3)

Now, if we change the last date.

index = pd.to_datetime(['2016-05-01', '2016-11-01', '2017-05-01'])
data = pd.DataFrame({'a': [1, 2, 3],
                     'b': [4, 5, 6]}, index=index)
ax = data.plot()
print(ax.get_xlim())

# Out: (184.8, 189.2)

The first example seems consistent with the matplotlib docs:

Matplotlib represents dates using floating point numbers specifying the number of days since 0001-01-01 UTC, plus 1

Why does the second example return something seemingly completely different? I'm using pandas version 0.22.0 and matplotlib version 2.2.2.


回答1:


Pandas uses different units to represents dates and times on the axes, depending on the range of dates/times in use. This means that different locators are in use.

In the first case,

print(ax.xaxis.get_major_locator())
# Out: pandas.plotting._converter.PandasAutoDateLocator

in the second case

print(ax.xaxis.get_major_locator())
# pandas.plotting._converter.TimeSeries_DateLocator

You may force pandas to always use the PandasAutoDateLocator using the x_compat argument,

df.plot(x_compat=True)

This would ensure to always get the same datetime definition, consistent with the matplotlib.dates convention.

The drawback is that this removes the nice quarterly ticking

and replaces it with the standard ticking

On the other hand it would then allow to use the very customizable matplotlib.dates tickers and formatters. For example to get quarterly ticks/labels

import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import matplotlib.ticker as mticker
import pandas as pd

index = pd.to_datetime(['2016-05-01', '2016-11-01', '2017-05-01'])
data = pd.DataFrame({'a': [1, 2, 3],
                     'b': [4, 5, 6]}, index=index)
ax = data.plot(x_compat=True)

# Quarterly ticks
ax.xaxis.set_major_locator(mdates.MonthLocator((1,4,7,10)))

# Formatting:
def func(x,pos):
    q = (mdates.num2date(x).month-1)//3+1
    tx = "Q{}".format(q)
    if q == 1:
        tx += "\n{}".format(mdates.num2date(x).year)
    return tx
ax.xaxis.set_major_formatter(mticker.FuncFormatter(func))
plt.setp(ax.get_xticklabels(), rotation=0, ha="center")

plt.show()




回答2:


In the second example, if you look at the plots, rather than giving dates matplotlib is giving quarter values:

The dates in this case are exactly six months and therefore two quarters apart, which is presumably why you're seeing this behavior. While I can't find it in the docs, the numbers given by xlim in this case are consistent with being the number of quarters since the Unix Epoch (Jan. 1, 1970).



来源:https://stackoverflow.com/questions/50988126/inconsistent-internal-representation-of-dates-in-matplotlib-pandas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!