Information Retrieval :URL hits in a time frame

心已入冬 提交于 2019-12-11 18:58:46

问题


Algorithm Challenge :

Problem statement : How would you design a logging system for something like Google , you should be able to query for the number of times a URL was opened within two time frames.

i/p : start_time , end_time , URL1 o/p : number of times URL1 was opened between start and end time.

Some specs : Database is not an optimal solution A URL might have been opened multiple times for given time stamp. A URL might have been opened a large number of times within two time stamps. start_time and end_time can be a month apart. time could be granular to a second.


回答1:


One solution :

Hash of a hash

Key Value URL Hash----> T1 CumFrequency

Eg :

Amazon Hash--> T CumFreq 11 00 am 3 ( opened 3 times at 11:00 am ) 11 15 am 4 ( opened 1 time at 11:15 am , cumfreq is 3+1=4) 11 30 am 11 ( opened 4 times at 11:30 am , cumfreq is 3+4+4=11) i/p : 11 : 10 am , 11 : 37 am , Amazon

the o.p can be obtained by subtracting , last timestamp less then 11:10 which 11:00 am , and last active time stamp less than 11:37 am which is 11:30 am. Hence the result is 11-3 = 8 ....

Can we do better ?



来源:https://stackoverflow.com/questions/14824189/information-retrieval-url-hits-in-a-time-frame

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!