Matching degree-based geographical coordinates with a regular expression

后端 未结 4 1256
长情又很酷
长情又很酷 2021-01-21 00:00

I\'d like to be able to identify patterns of the form

28°44\'30\"N., 33°12\'36\"E.

Here\'s what I have so far:

use utf8;
qr{
           


        
相关标签:
4条回答
  • 2021-01-21 00:33

    This:

    use strict;
    use warnings;
    use utf8;
    my $re = qr{
        (?:
        \d{1,3} \s*  °   \s*
        \d{1,2} \s*  '   \s*
        \d{1,2} \s*  "   \s*
        [ENSW]  \s* \.?
                \s*  ,?  \s*
        ){2}
    }x;
    if (q{28°44'30"N., 33°12'36"E.} =~ $re) {
        print "match\n";
    } else {
        print "no match\n";
    }
    

    works:

    $ ./coord.pl 
    match
    
    0 讨论(0)
  • 2021-01-21 00:41

    The ?: at the beginning of the regex makes it non-capturing, which is probably why the matches cannot be extracted or seen. Dropping it from the regex may be the solution.

    If all of the coordinates are fixed-format, unpack may be a better way of obtaining the desired values.

    my @twoCoordinates = unpack 'A2xA2xA2xAx3A2xA2xA2xA', "28°44'30"N., 33°12'36"E.";
    
    print "@twoCoordinates";  # returns '28 44 30 N 33 12 36 E'
    

    If not, then modify the regex:

    my @twoCoordinates = "28°44'30"N., 33°12'36"E." =~ /\w+/g;
    
    0 讨论(0)
  • 2021-01-21 00:49

    You forgot the x modifier on the qr operator.

    0 讨论(0)
  • 2021-01-21 00:51

    Try dropping the use utf8 statement.

    The degree symbol corresponds to character value 0xB0 in my current encoding (whatever that is, but it ain't UTF8). 0xB0 is a "continuation byte" in UTF8; it is expected to by the second, third, or fourth character of a sequence that begins with something between 0xC2 and 0xF4. Using that string with utf8 will give you an error.

    0 讨论(0)
提交回复
热议问题