What\'s the best way to ping a list of 20 websites every 5 minutes (for example) in order to know if the site responds with HTTP 202 or not?
The no brainer idea is to sa
I really like node.js and I would like to tackle this problem and hopefully soon share some code on github to achieve this. Keep in mind that I only have a veryy basic setup right now hosted at https://github.com/alfredwesterveld/freakinping
What's the best way to ping a list of 20 websites every 5 minutes (for example) in order to know if the site responds with HTTP 202 or not?
First I would like to know if you want to really do a ping(ICMP) or if you just want to know if the website returns with code 200(OK) and measure the time it takes. I believe from the context that you don't really want to do a ping, but just an http request and measure the time. I ask this because(I believe) pinging from node.js/ruby/python can't be done from normal user because we need raw sockets(root user) to do the pinging(ICMP) from programming language. I for example found this ping script in python(I also believe I saw a simple ruby script somewhere although I am not a really big ruby programmer) but requires root access. I don't believe there is even yet a ping module out there for node.js.
Also, is there better but no-brainer solution for this? I'm afraid the list can grow to 20000 websites and then there's not enough time to ping them all in the 5 minutes I need to be pinging.
Basically, I'm describing how PingDom, UptimeRobot, and the likes work.
What you need to achieve this kind of scale is to use a message queue like for example redis, beanstalkd or gearmand. At the scale of PingDom one worker process is not going to cut it, but in your case it(I assume) one worker will do. I think(assume) redis will be the fastest message queue because of the C(node.js) extension but then again I should benchmark it against beanstalkd, which is another popular message queue(but does not yet have a C extension).
I'm afraid the list can grow to 20000 websites
If you get at that scale you might have to have host multiple boxes(a lot of worker threads/processes) to handle the load but you aren't at that scale yet and node.js is insane fast. It might even be able to handle that load with even one single box, although I don't know for sure(you need to do/run some benchmarks).
I think this could be achieved pretty easily in node.js(I really like node.js). The way I would do this is use redis as my datastore because it is INSANE FAST!
PING: 20000 ops 46189.38 ops/sec 1/4/1.082
SET: 20000 ops 41237.11 ops/sec 0/6/1.210
GET: 20000 ops 39682.54 ops/sec 1/7/1.257
INCR: 20000 ops 40080.16 ops/sec 0/8/1.242
LPUSH: 20000 ops 41152.26 ops/sec 0/3/1.212
LRANGE (10 elements): 20000 ops 36563.07 ops/sec 1/8/1.363
LRANGE (100 elements): 20000 ops 21834.06 ops/sec 0/9/2.287
using node_redis(with hredis(node.js) c library). I would Add the URLs to redis using sadd.
This could be achieved without barely any effort. I would use the setInterval(callback, delay, [arg], [...])
to repeatedly test response time of servers. Get all URLs on callback
from redis using smembers. I would put all the URLs(messages) on the message queue using rpush.
However, what happen when one doesn't answers? What happens to the ones after that?
I might not completely understand this sentence but here it goes. If one fails it just fails. You could try to check response(time) again in 5 seconds or something to see if it is online. A precise algorithm for this should be devised. The ones after that should not have anything to do with previous URLs unless the are to the same server. Also something you clearly think about I guess because then you should not ping all those URLs to the same server at the same time but queue them up or something.
From the worker process(for now just one would be suffice) fetch message(URL) from redis using brpop command. check response time for URL(message) and fetch next URL(message) from the list. I would probably do a couple of request simultaneous to speed up the process.