Extract email addresses from text file using regex with bash or command line

后端 未结 4 1327
误落风尘
误落风尘 2020-12-15 08:32

How can I grep out only the email address using a regex from a file with multiple lines similar to this. (a sql dump to be precise)

Unfortunately I cannot just go b

相关标签:
4条回答
  • 2020-12-15 08:51

    If you still want to go the grep -o route, this one works for me:

    $ grep -i -o '[A-Z0-9._%+-]\+@[A-Z0-9.-]\+\.[A-Z]\{2,4\}' file.csv
    cgreen@blah.com
    $ 
    

    I appear to have 2 versions of grep in my path, 2.4.2 and 2.5.1. Only 2.5.1 appears to support the -o option.

    Your regular expression is close, but you're missing 2 things:

    • regular expressions are case sensitive. So you can either pass -i to grep or add extra a-z to your square bracket expressions
    • The + modifiers and {} curly braces appear to need to be escaped.
    0 讨论(0)
  • 2020-12-15 09:06

    If you know the field position then it is much easier with awk or cut:

    awk -F ',' '{print $7}' file
    

    OR

    cut -d ',' -f7 file
    
    0 讨论(0)
  • 2020-12-15 09:08

    The best way to handle this is with a proper CSV parser. A simple way to accomplish that, if it's a one-time task, is to load the CSV file into your favorite spreadsheet software, then extract just the email field.

    It is difficult to parse CSV with a regex, because of the possibility of escaped commas, quoted text, etc.

    Consider, the following are valid email addresses, according to Internet standards:

    • foo,bar@gmail.com
    • foo"bar@gmail.com

    If you know for a fact that you will never have this sort of data, then perhaps simple grep and awk tools will work (as in @anubhava's answer).

    0 讨论(0)
  • 2020-12-15 09:08

    You can solve it using python with the help of the built-in csv module and the external validators module, like this:

    import validators
    import csv
    import sys
    
    with open(sys.argv[1], newline='') as csvfile:
        csvreader = csv.reader(csvfile)
        for row in csvreader:
            for field in row:
                if validators.email(field):
                    print(field)
    

    Run it like:

    python3 script.py infile
    

    That yields:

    cgreen@blah.com
    
    0 讨论(0)
提交回复
热议问题