How to parse Apache logs using a regex in PHP

后端 未结 4 609
终归单人心
终归单人心 2021-02-09 17:00

I\'m trying to split this string in PHP:

11.11.11.11 - - [25/Jan/2000:14:00:01 +0100] \"GET /1986.js HTTP/1.1\" 200 932 \"http://domain.com/index.html\" \"Mozill         


        
相关标签:
4条回答
  • 2021-02-09 17:25

    This log format seems to be the Apache’s combined log format. Try this regular expression:

    /^(\S+) \S+ \S+ \[([^\]]+)\] "([A-Z]+)[^"]*" \d+ \d+ "[^"]*" "([^"]*)"$/m
    

    The matching groups are as follows:

    1. remote IP address
    2. request date
    3. request HTTP method
    4. User-Agent value

    But the domain is not listed there. The second quoted string is the Referer value.

    0 讨论(0)
  • 2021-02-09 17:29

    You should check out a regular expression tutorial. But here is the answer:

    if (preg_match('/^(\S+) \S+ \S+ \[(.*?)\] "(\S+).*?" \d+ \d+ "(.*?)" "(.*?)"/', $line, $m)) {
      $ip = $m[1];
      $date = $m[2];
      $method = $m[3];
      $referer = $m[4];
      $browser = $m[5];
    }
    

    Take care, it's not the domain name in the log but the HTTP referer.

    0 讨论(0)
  • 2021-02-09 17:34
    // # Parses the NCSA Combined Log Format lines:
    $pattern = '/^([^ ]+) ([^ ]+) ([^ ]+) (\[[^\]]+\]) "(.*) (.*) (.*)" ([0-9\-]+) ([0-9\-]+) "(.*)" "(.*)"$/';
    

    Usage:

    if (preg_match($pattern,$yourstuff,$matches)) {
    
        //# puts each part of the match in a named variable
    
        list($whole_match, $remote_host, $logname, $user, $date_time, $method, $request, $protocol, $status, $bytes, $referer, $user_agent) = $matches;
    
    }
    
    0 讨论(0)
  • 2021-02-09 17:42

    Here is some Perl, not PHP, but the regex to use is the same. This regex works to parse everything I've seen; clients can send some bizarre requests:

    my ($ip, $date, $method, $url, $protocol, $alt_url, $code, $bytes,
            $referrer, $ua) = (m/
        ^(\S+)\s                    # IP
        \S+\s+                      # remote logname
        (?:\S+\s+)+                 # remote user
        \[([^]]+)\]\s               # date
        "(\S*)\s?                   # method
        (?:((?:[^"]*(?:\\")?)*)\s   # URL
        ([^"]*)"\s|                 # protocol
        ((?:[^"]*(?:\\")?)*)"\s)    # or, possibly URL with no protocol
        (\S+)\s                     # status code
        (\S+)\s                     # bytes
        "((?:[^"]*(?:\\")?)*)"\s    # referrer
        "(.*)"$                     # user agent
    /x);
    die "Couldn't match $_" unless $ip;
    $alt_url ||= '';
    $url ||= $alt_url;
    
    0 讨论(0)
提交回复
热议问题