There is a hypothetical web server which supports only one very simple API - count of requests received in the last hour, minute and second. This server is very popular in t
If 100% accuracy is required:
Have a linked-list of all requests and 3 counts - for the last hour, the last minute and the last second.
You will have 2 pointers into the linked-list - for a minute ago and for a second ago.
An hour ago will be at the end of the list. Whenever the time of the last request is more than an hour before the current time, remove it from the list and decrement the hour count.
The minute and second pointers will point to the first request that occurred after a minute and a second ago respectively. Whenever the time of the request is more than a minute / second before the current time, shift up the pointer and decrement the minute / second count.
When a new request comes in, add it to all 3 counts and add it to the front of the linked-list.
Requests for the counts would simply involve returning the counts.
All of the above operations are amortised constant time.
If less than 100% accuracy is acceptable:
The space-complexity for the above could be a bit much, depending on how many requests per second you would typically get; you can reduce this by sacrificing slightly on accuracy as follows:
Have a linked-list as above, but only for the last second. Also have the 3 counts.
Then have a circular array of 60 elements indicating the counts of each of the last 60 seconds. Whenever a second passes, subtract the last (oldest) element of the array from the minute count and add the last second count to the array.
Have a similar circular array for the last 60 minutes.
Loss of accuracy: The minute count can be off by all the requests in a second and the hour count can be off by all the requests in a minute.
Obviously this won't really make sense if you only have one request per second or less. In this case you can keep the last minute in the linked-list and just have a circular array for the last 60 minutes.
There are also other variations on this - the accuracy to space used ratio can be adjusted as required.
A timer to remove old elements:
If you remove old elements only when new elements come in, it will be amortised constant time (some operations might take longer, but it will average out to constant time).
If you want true constant time, you can additionally have a timer running which removes old elements, and each invocation of this (and of course insertions and checking the counts) will only take constant time, since you're removing at most a number of elements inserted in the constant time since the last timer tick.
You can create an array of size 60x60 for each second in the hour and use it as circular buffer. Each entry contains number of requests for a given second. When you move to next second, clear it and start counting. When you are at then end of array, you start from 0 again, so effectively clearing all counts prior to 1 hour.
So all three have O(1) space and time complexity. Only drawback is, it ignores milliseconds, but you can apply same notion to include milliseconds as well.
One solution is like this:
1) Use a circular array of length 3600 (60 * 60 seconds in an hour) to hold the data for each second in last hour.
To record the data for a new second, drop the last second's data in the circular array by moving the circular array's head pointer.
2) In each element of the circular array, instead of holding the number of requests in a particular second, we record the cumulative sum for the number of requests we see previously, and the number of requests for a period can be calculated by requests_sum.get(current_second) - requests_sum.get(current_second - number_of_seconds_in_this_period)
All of the operations like increament()
, getCountForLastMinute()
, getCountForLastHour()
can be done in O(1)
time.
=========================================================================
Here is an example for how this works.
If we have request count in recent 3 seconds like this:
1st second: 2 requests
2nd second: 4 requests
3rd second: 3 requests
The circular array will look like this:
sum = [2, 6, 9]
where 6 = 4 + 2 and 9 = 2 + 4 + 3
In this case:
1) if you want to get the last second's request count (the 3rd second's request count), simply calculating sum[2] - sum[1] = 9 - 6 = 3
2) if you want to get the last two seconds' request count (the 3rd second's request count and the 2nd second's request count), simply calculating sum[2] - sum[0] = 9 - 2 = 7
Following code is in JS. It will return you the count in O(1). I wrote this program for an interview where time was pre defined to be 5 minutes. But you can modify this code for seconds, minutes, and so on. Let me know how it goes.
In clean_hits method remove each entry (outside our time range) from the object that we created and subtract that count from totalCount before you delete the entry
this.hitStore = { "totalCount" : 0};
To do this for time window of T seconds, have a queue data structure where you queue the timestamps of individual requests as they arrive. When you want to read the number of requests arrived during the most recent window of T seconds, first drop from the "old" end of the queue those timestamps that are older than T seconds, then read the size of the queue. You should also drop elements whenever you add a new request to the queue to keep its size bounded (assuming bounded rate for incoming requests).
This solution works up to arbitrary precision, e.g. millisecond accuracy. If you are content with returning approximate answers, you can e.g. for time window of T = 3600 (an hour), consolidate requests coming within same second into a single queue element, making queue size bounded by 3600. I think that would be more than fine, but theoretically loses accuracy. For T = 1, you can do consolidation on millisecond level if you want.
In pseudocode:
queue Q
proc requestReceived()
Q.insertAtFront(now())
collectGarbage()
proc collectGarbage()
limit = now() - T
while (! Q.empty() && Q.lastElement() < limit)
Q.popLast()
proc count()
collectGarbage()
return Q.size()
Here is a generic Java solution that can keep track of the number of events for the last minute.
The reason I used ConcurrentSkipListSet
is because it guarantees O(log N) average time complexity for search, insert and remove operations. You can easily change the code below to make the duration (1 minute by default) configurable.
As suggested in the answers above, it is a good idea to clean up stale entries periodically, using a scheduler for example.
@Scope(value = "prototype")
@Component
@AllArgsConstructor
public class TemporalCounter {
@Builder
private static class CumulativeCount implements Comparable<CumulativeCount> {
private final Instant timestamp;
private final int cumulatedValue;
@Override
public int compareTo(CumulativeCount o) {
return timestamp.compareTo(o.timestamp);
}
}
private final CurrentDateTimeProvider currentDateTimeProvider;
private final ConcurrentSkipListSet<CumulativeCount> metrics = new ConcurrentSkipListSet<>();
@PostConstruct
public void init() {
Instant now = currentDateTimeProvider.getNow().toInstant();
metrics.add(new CumulativeCount(now, 0));
}
public void increment() {
Instant now = currentDateTimeProvider.getNow().toInstant();
int previousCount = metrics.isEmpty() ? 0 : metrics.last().cumulatedValue;
metrics.add(new CumulativeCount(now, previousCount + 1));
}
public int getLastCount() {
if (!metrics.isEmpty()) {
cleanup();
CumulativeCount previousCount = metrics.first();
CumulativeCount mostRecentCount = metrics.last();
if (previousCount != null && mostRecentCount != null) {
return mostRecentCount.cumulatedValue - previousCount.cumulatedValue;
}
}
return 0;
}
public void cleanup() {
Instant upperBoundInstant = currentDateTimeProvider.getNow().toInstant().minus(Duration.ofMinutes(1));
CumulativeCount c = metrics.lower(CumulativeCount.builder().timestamp(upperBoundInstant).build());
if (c != null) {
metrics.removeIf(o -> o.timestamp.isBefore(c.timestamp));
if (metrics.isEmpty()) {
init();
}
}
}
public void reset() {
metrics.clear();
init();
}
}