An easy way to diff log files, ignoring the time stamps?

后端 未结 5 1743
终归单人心
终归单人心 2021-02-01 13:41

I need to diff two log files but ignore the time stamp part of each line (the first 12 characters to be exact). Is there a good tool, or a clever awk command, that could help m

相关标签:
5条回答
  • 2021-02-01 13:50

    For a graphical option, Meld can do this using its text filters feature.

    It allows for ignoring lines based on one or more python regex. The differences still appear, but lines that don't have any other differences won't be highlighted.

    0 讨论(0)
  • 2021-02-01 13:52

    Depending on the shell you are using, you can turn the approach @Blair suggested into a 1-liner

    diff <(cut -b13- file1) <(cut -b13- file2)
    

    (+1 to @Blair for the original suggestion :-)

    0 讨论(0)
  • 2021-02-01 14:03

    Use Kdiff3 and at Configure>Diff edit "Line-Matching Preprocessor command" to something like:

    sed "s/[ 012][0-9]:[0-5][0-9]:[0-5][0-9]//"

    This will filter out time-stamps from comparison alignment algorithm.

    Kdiff3 also lets you manually align specific lines.

    0 讨论(0)
  • 2021-02-01 14:04

    @EbGreen said

    I would just take the log files and strip the timestamps off the start of each line then save the file out to different files. Then diff those files.

    That's probably the best bet, unless your diffing tool has special powers. For example, you could

    cut -b13- file1 > trimmed_file1
    cut -b13- file2 > trimmed_file2
    diff trimmed_file1 trimmed_file2
    

    See @toolkit's response for an optimization that makes this a one-liner and obviates the need for extra files. If your shell supports it. Bash 3.2.39 at least seems to...

    0 讨论(0)
  • 2021-02-01 14:06

    Answers using cut are fine but sometimes keeping timestamps within the diff output is appreciable. As the OP's question is about ignoring the time stamps (not removing them), I share here my tricky command line:

    diff -I '^#' <(sed -r 's/^((.){12})/#\1\n/' 1.log) <(sed -r 's/^((.){12})/#\1\n/' 2.log)
    
    • sed isolates the timestamps (# before and \n after) within a process substitution
    • diff -I '^#' ignores lines having these timestamps (lines beginning by #)

    example

    Two log files having same content but different timestamps:

    $> for ((i=1;i<11;i++)) do echo "09:0${i::1}:00.000 data $i"; done > 1.log
    $> for ((i=1;i<11;i++)) do echo "11:00:0${i::1}.000 data $i"; done > 2.log
    

    Basic diff command line says all lines are different:

    $> diff 1.log 2.log
    1,10c1,10
    < 09:01:00.000 data 1
    < 09:02:00.000 data 2
    < 09:03:00.000 data 3
    < 09:04:00.000 data 4
    < 09:05:00.000 data 5
    < 09:06:00.000 data 6
    < 09:07:00.000 data 7
    < 09:08:00.000 data 8
    < 09:09:00.000 data 9
    < 09:01:00.000 data 10
    ---
    > 11:00:01.000 data 1
    > 11:00:02.000 data 2
    > 11:00:03.000 data 3
    > 11:00:04.000 data 4
    > 11:00:05.000 data 5
    > 11:00:06.000 data 6
    > 11:00:07.000 data 7
    > 11:00:08.000 data 8
    > 11:00:09.000 data 9
    > 11:00:01.000 data 10
    

    Our tricky diff -I '^#' does not display any difference (timestamps ignored):

    $> diff -I '^#' <(sed -r 's/^((.){12})/#\1\n/' 1.log) <(sed -r 's/^((.){12})/#\1\n/' 2.log)
    $>
    

    Change 2.log (replace data by foo on the 6th line) and check again:

    $> sed '6s/data/foo/' -i 2.log
    $> diff -I '^#' <(sed -r 's/^((.){12})/#\1\n/' 1.log) <(sed -r 's/^((.){12})/#\1\n/' 2.log)
    11,13c11,13
    11,13c11,13
    < #09:06:00.000
    <  data 6
    < #09:07:00.000
    ---
    > #11:00:06.000
    >  foo 6
    > #11:00:07.000
    

    => timestamps are kept in the diffoutput!

    You can also use the side by side feature using -y or --side-by-side option:

    $> diff -y -I '^#' <(sed -r 's/^((.){12})/#\1\n/' 1.log) <(sed -r 's/^((.){12})/#\1\n/' 2.log)
    #09:01:00.000                   #11:00:01.000
     data 1                          data 1
    #09:02:00.000                   #11:00:02.000
     data 2                          data 2
    #09:03:00.000                   #11:00:03.000
     data 3                          data 3
    #09:04:00.000                   #11:00:04.000
     data 4                          data 4
    #09:05:00.000                   #11:00:05.000
     data 5                          data 5
    #09:06:00.000                 | #11:00:06.000
     data 6                       |  foo 6
    #09:07:00.000                 | #11:00:07.000
     data 7                          data 7
    #09:08:00.000                   #11:00:08.000
     data 8                          data 8
    #09:09:00.000                   #11:00:09.000
     data 9                          data 9
    #09:01:00.000                   #11:00:01.000
     data 10                         data 10
    

    old sed

    If your sed implementation does not support the -r option, you may have to count the twelve dots <(sed 's/^\(............\)/#\1\n/' 1.log) or use another pattern of your choice ;)

    0 讨论(0)
提交回复
热议问题