How can I grep out only the email address using a regex from a file with multiple lines similar to this. (a sql dump to be precise)
Unfortunately I cannot just go b
If you still want to go the grep -o
route, this one works for me:
$ grep -i -o '[A-Z0-9._%+-]\+@[A-Z0-9.-]\+\.[A-Z]\{2,4\}' file.csv
cgreen@blah.com
$
I appear to have 2 versions of grep in my path, 2.4.2 and 2.5.1. Only 2.5.1 appears to support the -o option.
Your regular expression is close, but you're missing 2 things:
-i
to grep or add extra a-z
to your square bracket expressions+
modifiers and {}
curly braces appear to need to be escaped.If you know the field position then it is much easier with awk or cut:
awk -F ',' '{print $7}' file
OR
cut -d ',' -f7 file
The best way to handle this is with a proper CSV parser. A simple way to accomplish that, if it's a one-time task, is to load the CSV file into your favorite spreadsheet software, then extract just the email field.
It is difficult to parse CSV with a regex, because of the possibility of escaped commas, quoted text, etc.
Consider, the following are valid email addresses, according to Internet standards:
If you know for a fact that you will never have this sort of data, then perhaps simple grep and awk tools will work (as in @anubhava's answer).
You can solve it using python with the help of the built-in csv
module and the external validators
module, like this:
import validators
import csv
import sys
with open(sys.argv[1], newline='') as csvfile:
csvreader = csv.reader(csvfile)
for row in csvreader:
for field in row:
if validators.email(field):
print(field)
Run it like:
python3 script.py infile
That yields:
cgreen@blah.com