Logfile analysis in R?

前端 未结 5 1256
生来不讨喜
生来不讨喜 2021-02-01 06:04

I know there are other tools around like awstats or splunk, but I wonder whether there is some serious (web)server logfile analysis going on in R. I might not be the first thoug

5条回答
  •  北恋
    北恋 (楼主)
    2021-02-01 06:21

    #!python
    
    import argparse
    import csv
    import cStringIO as StringIO
    
    class OurDialect:
        escapechar = ','
        delimiter = ' '
        quoting = csv.QUOTE_NONE
    
    
    parser = argparse.ArgumentParser()
    parser.add_argument('-f', '--source', type=str, dest='line', default=[['''54.67.81.141 - - [01/Apr/2015:13:39:22 +0000] "GET / HTTP/1.1" 502 173 "-" "curl/7.41.0" "-"'''], ['''54.67.81.141 - - [01/Apr/2015:13:39:22 +0000] "GET / HTTP/1.1" 502 173 "-" "curl/7.41.0" "-"''']])
    arguments = parser.parse_args()
    
    try:
        with open(arguments.line, 'wb') as fin:
            line = fin.readlines()
    except: 
        pass
    finally:
        line = arguments.line
    
    header = ['IP', 'Ident', 'User', 'Timestamp', 'Offset', 'HTTP Verb', 'HTTP Endpoint', 'HTTP Version', 'HTTP Return code', 'Size in bytes', 'User-Agent']
    
    lines = [[l[:-1].replace('[', '"').replace(']', '"').replace('"', '') for l in l1] for l1 in line]
    
    out = StringIO.StringIO()
    
    writer = csv.writer(out)
    writer.writerow(header)
    
    writer = csv.writer(out,dialect=OurDialect)
    writer.writerows([[l1 for l1 in l] for l in lines])
    
    print(out.getvalue())
    

    Demo output:

    IP,Ident,User,Timestamp,Offset,HTTP Verb,HTTP Endpoint,HTTP Version,HTTP Return code,Size in bytes,User-Agent
    54.67.81.141, -, -, 01/Apr/2015:13:39:22, +0000, GET, /, HTTP/1.1, 502, 173, -, curl/7.41.0, -
    54.67.81.141, -, -, 01/Apr/2015:13:39:22, +0000, GET, /, HTTP/1.1, 502, 173, -, curl/7.41.0, -
    

    This format can easily be read into R using read.csv. And, it doesn't require any 3rd party libraries.

提交回复
热议问题