Extract emails from html using regex

前端 未结 3 1030
无人共我
无人共我 2021-01-22 20:25

I\'m trying to extract any jabber accounts (emails) using regex from this page.

I\'ve tried using regex:

\\w+@[\\w.-]+|\\{(?:\\w+, *)+\\w+\\}@[\\w.-]+


        
相关标签:
3条回答
  • 2021-01-22 21:10

    Try this one:

    reg_emails=r'^((([0-9a-zA-Z]+)[\_\.\-])*([0-9a-zA-Z]+))@((([0-9a-zA-Z]+)[\_\.\-])*([0-9a-zA-Z]+))\.((([0-9a-zA-Z]+)[\_\.\-])*([0-9a-zA-Z]+))$'
    
    0 讨论(0)
  • 2021-01-22 21:13

    This might work:

    [^\s@<>]+@[^\s@<>]+\.[^\s@<>]+

    p = re.compile(ur'[^\s@<>]+@[^\s@<>]+\.[^\s@<>]+', re.MULTILINE | re.IGNORECASE)
    test_str = r'...'
    re.findall(p, test_str)
    

    See example.

    0 讨论(0)
  • 2021-01-22 21:21
    # -*- coding: utf-8 -*-
    s = '''
    ...YOUR HTML page source code HERE..........
    
    '''
    
    import re
    reobj = re.compile(r"\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,6}\b", re.IGNORECASE)
    print re.findall(reobj, s.decode('utf-8'))
    

    Result

    [u'skypeman@jabbim.cz', u'sonics@creep.im', u'voxis_team@lsd-25.ru', u'voxis_team@lsd-25.ru', u'adhrann@jabbim.cz', u'jabberwocky@jabber.systemli.org']
    
    0 讨论(0)
提交回复
热议问题