问题
SImple question but I haven't been able to find a simple answer.
I have a list of data which counts the time in seconds that events occur:
[200.0 420.0 560.0 1100.0 1900.0 2700.0 3400.0 3900.0 4234.2 4800.0 etc..]
I want to count how many events occur each hour (3600 seconds) and create a new list of these counts.
I understand this is called downsampling, but all the information I can find is related to traditional time series.
For the example above the new list would look like:
[7 3 etc..]
Any help would be greatly appreciated.
回答1:
all_events = [
200.0, 420.0, 560.0, 1100.0, 1900.0, 2700.0, 3400.0, 3900.0, 4234.2, 4800.0]
def get_events_by_hour(all_events):
return [
len([x for x in all_events if int(x/3600.0) == hour])
for hour in xrange(24)
]
print get_events_by_hour(all_events)
Note that all_events should contain events for one day.
回答2:
The act of sampling means taking data f_i
(samples) at certain discrete times t_i
.
The number of samples per time unit gives the sampling rate.
Downsampling is a special case of resampling, which means mapping the sampled data onto a different set of sampling points t_i'
, here onto one with a smaller sampling rate, making the sample more coarse.
Your first list is containing sample points t_i
(unit is seconds), and indirectly the number of events n_i
which corresponds to the index i
, for example n_i = i + 1
.
If you reduce the list once in a while, after a periodic time T
(unit is seconds), you are resampling to a new set n_i'
at times t_i' = i * T
.
I did not write downsampling, because nothing might happen within an the time T
, which means upsampling, because you take more data points now.
For calculation you check if the input list is empty, in that case n' = 0
should go into your output list.
Otherwise you have m
entries in your input list, measured over time T
and you can use the below equation:
n' = m * 3600 / T
The above n'
would go into your output list, this is scaled to events per hour.
回答3:
The question has the scipy
tag, and scipy
depends on numpy
, so I assume an answer using numpy
is acceptable.
To get the hour associated with a timestamp t
you can take the integer part of t/3600
. Then, to get the number of events in each hour, you can count the number of occurrences of these integers. The numpy function bincount can do that for your.
Here's a numpy one-liner for the calculation. I put the timestamps in a numpy array t
:
In [49]: t = numpy.array([200.0, 420.0, 560.0, 1100.0, 1900.0, 2700.0, 3400.0, 3900.0, 4234.2, 4800.0, 8300.0, 8400.0, 9500.0, 10000.0, 14321.0, 15999.0, 16789.0, 17000.0])
In [50]: t
Out[50]:
array([ 200. , 420. , 560. , 1100. , 1900. , 2700. ,
3400. , 3900. , 4234.2, 4800. , 8300. , 8400. ,
9500. , 10000. , 14321. , 15999. , 16789. , 17000. ])
Here's your calculation:
In [51]: numpy.bincount((t/3600).astype(int))
Out[51]: array([7, 3, 4, 1, 3])
来源:https://stackoverflow.com/questions/28430323/how-to-resample-downsample-an-irregular-timestamp-list