How do you extract IP addresses from files using a regex in a linux shell?

前端 未结 19 1587
被撕碎了的回忆
被撕碎了的回忆 2020-11-28 02:43

How to extract a text part by regexp in linux shell? Lets say, I have a file where in every line is an IP address, but on a different position. What is the simplest way to e

相关标签:
19条回答
  • 2020-11-28 03:27

    You can use some shell helper I made: https://github.com/philpraxis/ipextract

    included them here for convenience:

    #!/bin/sh
    ipextract () 
    { 
    egrep --only-matching -E  '(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)' 
    }
    
    ipextractnet ()
    { 
    egrep --only-matching -E  '(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)/[[:digit:]]+' 
    }
    
    ipextracttcp ()
    { 
    egrep --only-matching -E  '[[:digit:]]+/tcp' 
    }
    
    ipextractudp ()
    { 
    egrep --only-matching -E  '[[:digit:]]+/udp' 
    }
    
    ipextractsctp ()
    { 
    egrep --only-matching -E  '[[:digit:]]+/sctp' 
    }
    
    ipextractfqdn ()
    { 
    egrep --only-matching -E  '[a-zA-Z0-9]+[a-zA-Z0-9\-\.]*\.[a-zA-Z]{2,}' 
    }
    

    Load it / source it (when stored in ipextract file) from shell:

    $ . ipextract

    Use them:

    $ ipextract < /etc/hosts
    127.0.0.1
    255.255.255.255
    $
    

    For some example of real use:

    ipextractfqdn < /var/log/snort/alert | sort -u
    dmesg | ipextractudp
    
    0 讨论(0)
  • 2020-11-28 03:31

    I wrote a little script to see my log files better, it's nothing special, but might help a lot of the people who are learning perl. It does DNS lookups on the IP addresses after it extracts them.

    0 讨论(0)
  • 2020-11-28 03:31

    For those who want a ready solution for getting IP addresses from apache log and listing occurences of how many times IP address has visited website, use this line:

    grep -Eo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' error.log | sort | uniq -c | sort -nr > occurences.txt
    

    Nice method to ban hackers. Next you can:

    1. Delete lines with less than 20 visits
    2. Using regexp cut till single space so you will have only IP addresses
    3. Using regexp cut 1-3 last numbers of IP addresses so you will have only network addresses
    4. Add deny from and a space at the beginning of each line
    5. Put the result file as .htaccess
    0 讨论(0)
  • 2020-11-28 03:32

    I'd suggest perl. (\d+.\d+.\d+.\d+) should probably do the trick.

    EDIT: Just to make it more like a complete program, you could do something like the following (not tested):

    #!/usr/bin/perl -w
    use strict;
    
    while (<>) {
        if (/(\d+\.\d+\.\d+\.\d+)/) {
            print "$1\n";
        }
    }
    

    This handles one IP per line. If you have more than one IPs per line, you need to use the /g option. man perlretut gives you a more detailed tutorial on regular expressions.

    0 讨论(0)
  • 2020-11-28 03:37

    I wanted to get only IP addresses that began with "10", from any file in a directory:

    grep -o -nr "[10]\{2\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}" /var/www
    
    0 讨论(0)
  • 2020-11-28 03:38

    All of the previous answers have one or more problems. The accepted answer allows ip numbers like 999.999.999.999. The currently second most upvoted answer requires prefixing with 0 such as 127.000.000.001 or 008.008.008.008 instead of 127.0.0.1 or 8.8.8.8. Apama has it almost right, but that expression requires that the ipnumber is the only thing on the line, no leading or trailing space allowed, nor can it select ip's from the middle of a line.

    I think the correct regex can be found on http://www.regextester.com/22

    So if you want to extract all ip-adresses from a file use:

    grep -Eo "(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])" file.txt
    

    If you don't want duplicates use:

    grep -Eo "(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])" file.txt | sort | uniq
    

    Please comment if there still are problems in this regex. It easy to find many wrong regex for this problem, I hope this one has no real issues.

    0 讨论(0)
提交回复
热议问题