The DataFrame
has timestamped data and I want to visually compare the daily temporal evolution of the data. If I groupby
day and plot the graphs; they are obviously displaced horizontaly in time due to differences in their dates.
I want to plot a date agnostic graph of the day wise trends on a time only axis. Towards that end I have resorted to shift
ing the data back by an appropriate number of days as demonstrated in the following code
import pandas as pd
import datetime
import matplotlib.pyplot as plt
index1 = pd.date_range('20141201', freq='H', periods=2)
index2 = pd.date_range('20141210', freq='2H', periods=4)
index3 = pd.date_range('20141220', freq='3H', periods=5)
index = index1.append([index2, index3])
df = pd.DataFrame(list(range(1, len(index)+1)), index=index, columns=['a'])
gbyday = df.groupby(df.index.day)
first_day = gbyday.keys.min() # convert all data to this day
plt.figure()
ax = plt.gca()
for n,g in gbyday:
g.shift(-(n-first_day+1), 'D').plot(ax=ax, style='o-', label=str(n))
plt.show()
resulting in the following plot
Question: Is this the pandas way of doing it? In other words how can I achieve this more elegantly?
You can select the hour
attribute of the index after grouping like this:
In [36]: fig, ax = plt.subplots()
In [35]: for label, s in gbyday:
....: ax.plot(s.index.hour, s, 'o-', label=label)
It might be a little too late for this answer, but in case anyone is still looking for it.
This solution works on different months (it was an issue if using the code from the original question) and keeps fractional hours.
import pandas as pd
import matplotlib.pyplot as plt
index0 = pd.date_range('20141101', freq='H', periods=2)
index1 = pd.date_range('20141201', freq='H', periods=2)
index2 = pd.date_range('20141210', freq='2H', periods=4)
index3 = pd.date_range('20141220', freq='3H', periods=5)
index = index1.append([index2, index3, index0])
df = pd.DataFrame(list(range(1, len(index)+1)), index=index, columns=['a'])
df['time_hours'] = (df.index - df.index.normalize()) / pd.Timedelta(hours=1)
fig, ax = plt.subplots()
for n,g in df.groupby(df.index.normalize()):
ax.plot(g['time_hours'], g['a'], label=n, marker='o')
ax.legend(loc='best')
plt.show()
来源:https://stackoverflow.com/questions/27603593/plotting-data-for-different-days-on-a-single-hhmmss-axis