问题
I am trying to import the contents of a log file into a database using Spring Batch.
I am currently using a FlatFileItemReader, but there are unfortunately many log entries that doesn't catch. The two main problems are:
Lines that contain multi-line JSON Strings:
2012-03-22 11:47:35,307 DEBUG main someMethod(SomeClass.java:56): Do Something(18,true,null,null,null): my.json = '{ "Foo":"FooValue", "Bar":"BarValue", ... etc }'
Lines that contain stack traces
2012-03-22 11:47:50,596 ERROR main com.meetup.memcached.SockIOPool.createSocket(SockIOPool.java:859): No route to host java.net.NoRouteToHostException: No route to host at sun.nio.ch.Net.connect0(Native Method) at sun.nio.ch.Net.connect(Net.java:364) at sun.nio.ch.Net.connect(Net.java:356) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:623) at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:92) at com.meetup.memcached.SockIOPool$SockIO.getSocket(SockIOPool.java:1703) at com.meetup.memcached.SockIOPool$SockIO.<init>(SockIOPool.java:1674) at com.meetup.memcached.SockIOPool.createSocket(SockIOPool.java:850) at com.meetup.memcached.SockIOPool.populateBuckets(SockIOPool.java:737) at com.meetup.memcached.SockIOPool.initialize(SockIOPool.java:695)
Basically, I need the FlatFileItemReader to keep reading until it reaches the next timestamp, while aggregating all the lines before that. Has any such thing been done before (in Spring Batch)
回答1:
There's now an FAQ in the Spring Batch documentation addressing this use case.
回答2:
The solution was to write a custom reader that backtracks the last several lines and looks for a specific pattern that marks valid line starts. I did not find anything pre-made in Spring Batch, but I could reuse a lot of existing code. The solution is proprietary, so I can't post it here, sorry, but this is how it works:
- Keep a LinkedList of Lines. LinkedList is important, because we'll access it both as a List and as a Queue.
- In your read method, start a loop: read the next line and write it to your queue. Check your queue to see if you have two valid lines in there (you'll need list access here). If you do, return all lines before the second valid line (and remove them from the queue). If you don't find any valid line, return null.
Needless to say, this solution is noticably slower than the built-in FlatFileItemReader, but it gets the correct data.
来源:https://stackoverflow.com/questions/9939851/spring-batch-how-to-process-multi-line-log-files