问题
Here's a quick problem that I, at first, dismissed as easy. An hour in, and I'm not so sure!
So, I have a list of Python datetime
objects, and I want to graph them. The x-values are the year and month, and the y-values would be the amount of date objects in this list that happened in this month.
Perhaps an example will demonstrate this better (dd/mm/yyyy):
[28/02/2018, 01/03/2018, 16/03/2018, 17/05/2018]
-> ([02/2018, 03/2018, 04/2018, 05/2018], [1, 2, 0, 1])
My first attempt tried to simply group by date and year, along the lines of:
import itertools
group = itertools.groupby(dates, lambda date: date.strftime("%b/%Y"))
graph = zip(*[(k, len(list(v)) for k, v in group]) # format the data for graphing
As you've probably noticed though, this will group only by dates that are already present in the list. In my example above, the fact that none of the dates occurred in April would have been overlooked.
Next, I tried finding the starting and ending dates, and looping over the months between them:
import datetime
data = [[], [],]
for year in range(min_date.year, max_date.year):
for month in range(min_date.month, max_date.month):
k = datetime.datetime(year=year, month=month, day=1).strftime("%b/%Y")
v = sum([1 for date in dates if date.strftime("%b/%Y") == k])
data[0].append(k)
data[1].append(v)
Of course, this only works if min_date.month
is smaller than max_date.month
which is not necessarily the case if they span multiple years. Also, its pretty ugly.
Is there an elegant way of doing this?
Thanks in advance
EDIT: To be clear, the dates are datetime
objects, not strings. They look like strings here for the sake of being readable.
回答1:
I suggest use pandas:
import pandas as pd
dates = ['28/02/2018', '01/03/2018', '16/03/2018', '17/05/2018']
s = pd.to_datetime(pd.Series(dates), format='%d/%m/%Y')
s.index = s.dt.to_period('m')
s = s.groupby(level=0).size()
s = s.reindex(pd.period_range(s.index.min(), s.index.max(), freq='m'), fill_value=0)
print (s)
2018-02 1
2018-03 2
2018-04 0
2018-05 1
Freq: M, dtype: int64
s.plot.bar()
Explanation:
- First create Series from list of
date
s and convert to_datetimes. - Create
PeriodIndex
by Series.dt.to_period - groupby by index (
level=0
) and get counts by GroupBy.size - Add missing periods by Series.reindex by PeriodIndex created by max and min values of index
- Last plot, e.g. for bars - Series.plot.bar
回答2:
using Counter
dates = list()
import random
import collections
for y in range(2015,2019):
for m in range(1,13):
for i in range(random.randint(1,4)):
dates.append("{}/{}".format(m,y))
print(dates)
counter = collections.Counter(dates)
print(counter)
for your problem with dates with no occurrences you can use the subtract
method of Counter
generate a list with all range of dates, each date will appear on the list only once, and then you can use subtract
like so
tmp_date_list = ["{}/{}".format(m,y) for y in range(2015,2019) for m in range(1,13)]
counter.subtract(tmp_date_list)
来源:https://stackoverflow.com/questions/49584924/python-group-dates-by-month