How to write an efficient hit counter for websites

怎甘沉沦 提交于 2019-12-05 00:39:31

问题


I want to write a hit counter script to keep track of hits on images on a website and the originating IPs. Impressions are upwards of hundreds of thousands per day, so the counters will be incremented many times a second.

I'm looking for a simple, self-hosted method (php, python scripts, etc.). I was thinking of using MySQL to keep track of this, but I'm guessing there's a more efficient way. What are good methods of keeping counters?


回答1:


A fascinating subject. Incrementing a counter, simple as it may be, just has to be a transaction... meaning, it can lock out the whole DB for longer than makes sense!-) It can easily be the bottleneck for the whole system.

If you need rigorously exact counts but don't need them to be instantly up-to-date, my favorite approach is to append the countable information to a log (switching logs as often as needed for data freshness purposes). Once a log is closed (with thousands of countable events in it), a script can read it and update all that's needed in a single transaction -- maybe not intuitive, but much faster than thousands of single locks.

Then there's extremely-fast counters that are only statistically accurate -- but since you don't say that such imprecision is acceptable, I'm not going to explain them in more depth.




回答2:


You could take your webserver's Access log (Apache: access.log) and evaluate it time and again (cronjob) in case you do not need to have the data at hand at the exact moment in time when someone visits your site.

Usually, the access.log is generated anyway and contains the requested resource as well as time, date and the user's IP. This way you do not have to route all trafic through a php-script. Lean, mean counting machine.




回答3:


There are two really easy ways:

  1. Parse it out of your web logs in batch.
  2. Run the hits through beanstalkd or gearmand and have a worker do the hard stuff in a controlled way.

Option 1 works with off-the-shelf tools. Option 2 requires just a bit of programming, but gives you something closer to realtime updates without causing you to fall over when the traffic spikes (such as you'll find in your direct mysql case).




回答4:


Without a doubt, Redis is perfect for this problem. It requires about a minute to setup and install, supports atomic increments, is incredibly fast, has client libs for python and php (and many other languages), is durable (snapshots, journal, replication).

Store each counter to its own key. Then simply

INCR key



回答5:


If accuracy is important, you can do it slightly slower with MySql... create a HEAP / Memory table to store your counter values. These a in-memory tables that are blazingly fast. You can write the data into a normal table at intervals.

Based on the app engine ideas, you could use memcache as a temporary store for your counter. Incrementing a memcache counter is faster than using the MySql heap tables (I think). Once every five or ten seconds, you could read the memcache counter and write that number into your DB.




回答6:


Not sure if it's up your alley, but AppEngine is a pretty nice platform to build on. Some sample code you can use to build a counter using their DataStore and transactions is described here: http://code.google.com/appengine/docs/python/datastore/transactions.html.




回答7:


You can use Redis - it`s very fast key-value storage with support for atomic increments. If need will arise -- counts data could be splitted between multiple servers easily.




回答8:


I've done something very similar, on a similar scale (multiple servers, hundreds of domains, several thousand hits per hour) and log file analysis was definitely the way to go. (It also checked hit rates, weighted them by file type, and blacklisted IP addresses at the firewall if they were making too many requests; its intended purpose was to auto-block bad bots, not to just be a counter, but counting was an essential piece of it.)

No performance impact on the web server process itself, since it's not doing any additional work there, and you could easily publish periodically-updated hit counts by injecting them into the site's database every minute/5 minutes/100 hits/whatever without having to lock the relevant row/table/database (depending on the locking mechanism in use) on every hit.




回答9:


Well if you happen to go the PHP route you could use an SQLite database, however MySQL is a perfectly reasonable way to store that info and usually (at least from the ones I've seen) is how it is done.

If you didn't want to store IP address and any other info a simple number in a text file could work.



来源:https://stackoverflow.com/questions/1535261/how-to-write-an-efficient-hit-counter-for-websites

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!