I have a database driven website serving about 50,000 pages.
I want to track each webpage/record hit. I will do this by creating logs, and then batch processing the logs
All depends on your infrastructure and limitations. If the disk is slow, writing will be slow. If the SQL server is lagged by the requests, the insert will be slow. Flat file is probably the best way to go, but I would write your code or use existing code (PEAR::Log) so you can change the provider and storage method at will.
Use a database - it is the only sane option. Even if it takes a little longer. Once you start with logfiles then you are on a track where it will cause you pain - e.g. moving servers, file permissions, precludes load balancing etc...
If you've got the database open then I reckon that it would be probably quicker to insert a single row.
However with all this performance related the only way to be sure is to write a simple test and measure it....
Update: I've done a quick test - and sure enough if you have to open and close the file it's about the same speed or slower using a test of 10,000 lines:
However when you start to have multiple processes doing this it slows down as can be seen below. This is with 10 concurrent processes (all timings in seconds)
DB time: 2.1695
DB time: 2.3869
DB time: 2.4305
DB time: 2.5864
DB time: 2.7465
DB time: 3.0182
DB time: 3.1451
DB time: 3.3298
DB time: 3.4483
DB time: 3.7812
File open time: 0.1538
File open time: 0.5478
File open time: 0.7252
File open time: 3.0453
File open time: 4.2661
File open time: 4.4247
File open time: 4.5484
File open time: 4.6319
File open time: 4.6501
File open time: 4.6646
Open close file time: 11.3647
Open close file time: 12.2849
Open close file time: 18.4093
Open close file time: 18.4202
Open close file time: 21.2621
Open close file time: 22.7267
Open close file time: 23.4597
Open close file time: 25.6293
Open close file time: 26.1119
Open close file time: 29.1471
function debug($d)
{
static $start_time = NULL;
static $start_code_line = 0;
if( $start_time === NULL )
{
$start_time = time() + microtime();
$start_code_line = $code_line;
return 0;
}
printf("$d time: %.4f\n", (time() + microtime() - $start_time));
$fp = @fopen('dbg.txt','a');
fprintf($fp,"$d time: %.4f\n", (time() + microtime() - $start_time));
fclose($fp);
$start_time = time() + microtime();
$start_code_line = $code_line;
}
function tfile()
{
$fp = @fopen('t1.txt','a');
for ($i=0;$i<10000;$i++)
{
$txt = $i."How would you log, which do you think is quicker:How would you log, which do you think is quicker:";
fwrite($fp,$txt);
}
fclose($fp);
}
function tfile_openclose()
{
for ($i=0;$i<10000;$i++)
{
$fp = @fopen('t1.txt','a');
$txt = $i."How would you log, which do you think is quicker:How would you log, which do you think is quicker:";
fwrite($fp,$txt);
fclose($fp);
}
}
function tdb()
{
$db = mysql_connect('localhost','tremweb','zzxxcc');
$select_db = mysql_select_db('scratch');
if (!$select_db)
die('Error selecting database.');
for ($i=0;$i<10000;$i++)
{
$txt = $i."How would you log, which do you think is quicker:How would you log, which do you think is quicker:";
mysql_query("INSERT INTO tlog values('".$txt."')");
}
}
debug("");
tfile();
debug("File open");
tfile_openclose();
debug("Open close file");
tdb();
debug("DB");
I read an article in the C++ Users Journal, years ago, about loggin performance. Whether you use DB or files, the best thing to do is write unformatted data that can be "inflated" into meaningful data when (and more likely if) you need to view the logs. The vast majority of the cost of logging is informatting the strings that are written to the destination, and most of the time that cost is wasted - the logs are never read.
I can dig out the article reference if it's useful to you.
You should try SQLite. It will give you both the speed of writing to a file as well as the power of a database.
You could try both ways using log4php, which supports:
Regarding logging into a file, you could improve performance by buffering the write requests.
I would use a Delayed Insert into MySQL. This way you don't have to wait for the insert to finish.