Spring Batch: How to process multi-line log files

徘徊边缘 提交于 2019-12-06 06:58:11

问题


I am trying to import the contents of a log file into a database using Spring Batch.

I am currently using a FlatFileItemReader, but there are unfortunately many log entries that doesn't catch. The two main problems are:

  1. Lines that contain multi-line JSON Strings:

    2012-03-22 11:47:35,307  DEBUG main someMethod(SomeClass.java:56): Do Something(18,true,null,null,null): my.json = '{
        "Foo":"FooValue",
        "Bar":"BarValue",
        ... etc
    }'
    
  2. Lines that contain stack traces

    2012-03-22 11:47:50,596  ERROR main com.meetup.memcached.SockIOPool.createSocket(SockIOPool.java:859): No route to host
    java.net.NoRouteToHostException: No route to host
            at sun.nio.ch.Net.connect0(Native Method)
            at sun.nio.ch.Net.connect(Net.java:364)
            at sun.nio.ch.Net.connect(Net.java:356)
            at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:623)
            at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:92)
            at com.meetup.memcached.SockIOPool$SockIO.getSocket(SockIOPool.java:1703)
            at com.meetup.memcached.SockIOPool$SockIO.<init>(SockIOPool.java:1674)
            at com.meetup.memcached.SockIOPool.createSocket(SockIOPool.java:850)
            at com.meetup.memcached.SockIOPool.populateBuckets(SockIOPool.java:737)
            at com.meetup.memcached.SockIOPool.initialize(SockIOPool.java:695)
    

Basically, I need the FlatFileItemReader to keep reading until it reaches the next timestamp, while aggregating all the lines before that. Has any such thing been done before (in Spring Batch)


回答1:


There's now an FAQ in the Spring Batch documentation addressing this use case.




回答2:


The solution was to write a custom reader that backtracks the last several lines and looks for a specific pattern that marks valid line starts. I did not find anything pre-made in Spring Batch, but I could reuse a lot of existing code. The solution is proprietary, so I can't post it here, sorry, but this is how it works:

  1. Keep a LinkedList of Lines. LinkedList is important, because we'll access it both as a List and as a Queue.
  2. In your read method, start a loop: read the next line and write it to your queue. Check your queue to see if you have two valid lines in there (you'll need list access here). If you do, return all lines before the second valid line (and remove them from the queue). If you don't find any valid line, return null.

Needless to say, this solution is noticably slower than the built-in FlatFileItemReader, but it gets the correct data.



来源:https://stackoverflow.com/questions/9939851/spring-batch-how-to-process-multi-line-log-files

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!