How to extract a text part by regexp in linux shell? Lets say, I have a file where in every line is an IP address, but on a different position. What is the simplest way to e
You can use some shell helper I made: https://github.com/philpraxis/ipextract
included them here for convenience:
#!/bin/sh
ipextract ()
{
egrep --only-matching -E '(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)'
}
ipextractnet ()
{
egrep --only-matching -E '(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)/[[:digit:]]+'
}
ipextracttcp ()
{
egrep --only-matching -E '[[:digit:]]+/tcp'
}
ipextractudp ()
{
egrep --only-matching -E '[[:digit:]]+/udp'
}
ipextractsctp ()
{
egrep --only-matching -E '[[:digit:]]+/sctp'
}
ipextractfqdn ()
{
egrep --only-matching -E '[a-zA-Z0-9]+[a-zA-Z0-9\-\.]*\.[a-zA-Z]{2,}'
}
Load it / source it (when stored in ipextract file) from shell:
$ . ipextract
Use them:
$ ipextract < /etc/hosts
127.0.0.1
255.255.255.255
$
For some example of real use:
ipextractfqdn < /var/log/snort/alert | sort -u
dmesg | ipextractudp
I wrote a little script to see my log files better, it's nothing special, but might help a lot of the people who are learning perl. It does DNS lookups on the IP addresses after it extracts them.
For those who want a ready solution for getting IP addresses from apache log and listing occurences of how many times IP address has visited website, use this line:
grep -Eo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' error.log | sort | uniq -c | sort -nr > occurences.txt
Nice method to ban hackers. Next you can:
deny from
and a space at the beginning of each lineI'd suggest perl. (\d+.\d+.\d+.\d+) should probably do the trick.
EDIT: Just to make it more like a complete program, you could do something like the following (not tested):
#!/usr/bin/perl -w
use strict;
while (<>) {
if (/(\d+\.\d+\.\d+\.\d+)/) {
print "$1\n";
}
}
This handles one IP per line. If you have more than one IPs per line, you need to use the /g option. man perlretut gives you a more detailed tutorial on regular expressions.
I wanted to get only IP addresses that began with "10", from any file in a directory:
grep -o -nr "[10]\{2\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}" /var/www
All of the previous answers have one or more problems. The accepted answer allows ip numbers like 999.999.999.999. The currently second most upvoted answer requires prefixing with 0 such as 127.000.000.001 or 008.008.008.008 instead of 127.0.0.1 or 8.8.8.8. Apama has it almost right, but that expression requires that the ipnumber is the only thing on the line, no leading or trailing space allowed, nor can it select ip's from the middle of a line.
I think the correct regex can be found on http://www.regextester.com/22
So if you want to extract all ip-adresses from a file use:
grep -Eo "(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])" file.txt
If you don't want duplicates use:
grep -Eo "(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])" file.txt | sort | uniq
Please comment if there still are problems in this regex. It easy to find many wrong regex for this problem, I hope this one has no real issues.