I have a database driven website serving about 50,000 pages.
I want to track each webpage/record hit. I will do this by creating logs, and then batch processing the logs
If you are using either file based logging or database based logging, your biggest performance hit will be file/table locking. Basically, if client A and client B connects within a relatively small time frame, client B is stuck waiting for the lock to be released on the hits file/table before continuing.
The problem with a file based mechanism is that file locking is essential to ensure that your hits doesn't get corrupted. The only way around that is to implement a queue to do a delayed write to the file.
With database logging, you can at least do the following [MySQL using MyISAM]:
INSERT DELAYED INTO `hits` ...
See 12.2.5.2. INSERT DELAYED Syntax for more information.
Into file will be quicker, but into DB will be better.
As others mentioned - it depends on lots of things such as traffic, disk speed, etc. You'll have to test both the scenarios.
While testing MySQL, try both MyISAM and INNODB. In theory, Innodb will perform better as it has row level locking.
I've done something similar. I log each record to a separate file, then I have a batch process that grabs the files, puts them into a tar file and uploads them to the central log server (in my case, S3 :)).
I generate random file names for each log entry. I do this to avoid locking files for rotation. It's really easy to archive/delete this way.
I use json as my log format instead of the typical white space delimited log files. This makes it easier to parse and add fields in the future. It also means it's easier for me to write an entry per file than appending multiple records per file.
I've also used log4php+syslog-ng to centralize logging in real time. I have log4php log to syslog, which then forwards to the logs to my central server. This is really useful on larger clusters. One caveat is that there's a length limit to syslog messages, so you risk longer messages being truncated.
Write to file. Rotate logs.
Batch load the file to the database on a scheduled basis.
There are many, many reasons to choose this architecture -- ease of scaling (write to many logs, load them to db), lack of reliance on a SPOF in the database (if something goes wrong, you just accumulate logs for a while), ability to do cleaning and non-trivial parsing at load-time without burdening your production servers, and more.
If this is for a database driven site, why aren't you just using the built in logging capabilities of Apache or IIS, and a suitable reporting tool such as AWStats and beyond that, there's always Google Analytics
AWStats and webserver logging is my preference - you essentially get it for free anyway - even if you're not after traffic analysis, you could still consider parsing the Apache access log file yourself for whatever batch processing you need to do.