I have the same problem as somebody described in another post. My application's log files are huge (~1GB), and grep is tedious to use to correlate information from the log files. Right now I use the ''less'' tool, but it is also slower than I would like.
I am thinking of speeding up the search. There are the following ways to do this: first, generate logs in XML and use some XML search tool. I am not sure how much speedup will be obtained using XML search (not much I guess, since non-indexed file search will still take ages).
Second, use an XML database. This would be better, but I don't have much background here.
Third, use a (non-XML) database. This would be somewhat tedious since the table schema has to be written (has it to be done for second option above too?). I also foresee the schema to change a lot at the start to include common use cases. Ideally, I would like something lighter than a full-fledged database for storing the logs.
Fourth, use lucene. It seems to fit the purpose, but is there a simple way to specify the indexes for the current use case? For example, I want to say "index whenever you see the word 'iteration'".
What is your opinion?
The problem is using XML will make your log file even bigger I would suggest either splitting up your log files by date or lines otherwise use file based database engines such as sqlite
A gigabyte isn't that big, really. What kind of "correlation" are you trying to do with these log files? I've often found it's simpler to write a custom program (or script) to handle a log file in a particular way than it is to try to come up with a database schema to handle everything you'll ever want to do with it. Of course, if your log files are hard to parse for whatever reason, it may well be worth trying to fix that aspect.
(I agree with kuoson, by the way - XML is almost certainly not the way to go.)
If you can check your logs on Windows, or using Wine, LogParser is a great tool to mine data out of logs, it practically allows you to run SQL queries on any log, with no need to change any code or log formats, and it can even be used generate quick HTML or excel reports.
Also a few years ago, when XML was in the hype I was using XML logs, and XSLT stylesheets to produce views, it was actually kinda nice, but it used way to much memory and it would choke on large files, so you probably DON'T want to use XML.
The trouble with working on log files is that each one has to be queried individually, you'll get a much sharper response if you could create an index of the log files and search/query that instead. Lucene would be my next port of call, then solr.
Maybe you could load your log into Emacs (provided you have sufficient memory) and use the various Emacs features such as incremental search and Alt-X occur.
Disclaimer: I haven't tried this on files > 100MB.
来源:https://stackoverflow.com/questions/755063/fast-search-in-logs