php regex, extract like phone number regex from html documents

后端 未结 5 2145
渐次进展
渐次进展 2021-01-16 09:33

I\'m trying to extract a specific information from different html pages. Basically the information is a 10 digits number which may have different forms such :

000         


        
相关标签:
5条回答
  • 2021-01-16 09:55
    <?php
    preg_match_all("/\+?[0-9][\d-\()-\s+]{5,12}[1-9]/", $string, $matches);
    print_r($matches);
    ?>
    
    0 讨论(0)
  • 2021-01-16 10:02

    Consider other delimiters besides hyphens, not to mention parentheses.

    (?:1\s*?[-.]?\s*)?(?:\(\s*d{3}\s*\)|d{3})\s*?[-.]?\s*\d{3}\s*?[-.]?\s*\d{4}\b
    

    Okay, maybe that's more comprehensive than you need, but really this can get as complicated as you like. You can expand it to look for international phone numbers, extensions, and so forth, but that might not be worth it for you.

    0 讨论(0)
  • 2021-01-16 10:04

    \b[0-9]{3}\s*[-]?\s*[0-9]{3}\s*[-]?\s*[0-9]{4}\b

    Edit

    Added word boundaries.

    0 讨论(0)
  • 2021-01-16 10:08

    This will match on all three examples you listed.

    (\d{3}\s*-?\s*\d{3}\s*-?\s*\d{4})
    
    0 讨论(0)
  • 2021-01-16 10:19

    Here's a good starting point:

    <?php 
    
    // all on one line... 
    $regex = '/^(?:1(?:[. -])?)?(?:\((?=\d{3}\)))?([2-9]\d{2})(?:(?<=\(\d{3})\))? ?(?:(?<=\d{3})[.-])?([2-9]\d{2})[. -]?(\d{4})(?: (?i:ext)\.? ?(\d{1,5}))?$/';
    
    // or broken up 
    $regex = '/^(?:1(?:[. -])?)?(?:\((?=\d{3}\)))?([2-9]\d{2})' 
            .'(?:(?<=\(\d{3})\))? ?(?:(?<=\d{3})[.-])?([2-9]\d{2})' 
            .'[. -]?(\d{4})(?: (?i:ext)\.? ?(\d{1,5}))?$/'; 
    
    ?> 
    

    Note the non-capturing subpatterns (which look like (?:stuff)). That makes formatting easy:

    <?php 
    
    $formatted = preg_replace($regex, '($1) $2-$3 ext. $4', $phoneNumber); 
    
    // or, provided you use the $matches argument in preg_match 
    
    $formatted = "($matches[1]) $matches[2]-$matches[3]"; 
    if ($matches[4]) $formatted .= " $matches[4]"; 
    
    ?>
    

    And some example results for you:

    520-555-5542 :: MATCH 
    520.555.5542 :: MATCH 
    5205555542 :: MATCH 
    520 555 5542 :: MATCH 
    520) 555-5542 :: FAIL 
    (520 555-5542 :: FAIL 
    (520)555-5542 :: MATCH 
    (520) 555-5542 :: MATCH 
    (520) 555 5542 :: MATCH 
    520-555.5542 :: MATCH 
    520 555-0555 :: MATCH 
    (520)5555542 :: MATCH 
    520.555-4523 :: MATCH 
    19991114444 :: FAIL 
    19995554444 :: MATCH 
    514 555 1231 :: MATCH 
    1 555 555 5555 :: MATCH 
    1.555.555.5555 :: MATCH 
    1-555-555-5555 :: MATCH 
    520-555-5542 ext.123 :: MATCH 
    520.555.5542 EXT 123 :: MATCH 
    5205555542 Ext. 7712 :: MATCH 
    520 555 5542 ext 5 :: MATCH 
    520) 555-5542 :: FAIL 
    (520 555-5542 :: FAIL 
    (520)555-5542 ext .4 :: FAIL 
    (512) 555-1234 ext. 123 :: MATCH 
    1(555)555-5555 :: MATCH
    

    You'll probably get a lot of false positives if you allow spaces and dashes like you're suggesting.

    0 讨论(0)
提交回复
热议问题