问题
I retrieve data from the net containing real geodesic expressions, by that I mean degrees, minutes and seconds with Unicode symbols: U+00B0, U+2032 and U+2033
, named Degree, Prime and Double Prime. Example:
my $Lat = "48° 25′ 43″ N";
My objective is to convert such an expression first to degrees and then to radians to be used in a Perl module I am writing that implements the Vincenty inverse formula to calculate ellipsoidal great-circle distances. All my code objectives have been met with pseudo geodesics, such as "48:25:43 N", but of course, this is hand entered test data, not real world data. I am struggling with crafting a regular expression that can split this real data as I now do pseudo data, as in:
my ($deg, $min, $sec, $dir) = split(/[\s:]+/, $_[0], 4); # this works
I have tried many regular expressions including
/[°′″\s]+/ and
/[\x{0B00}\x{2032}\x{2033}\s]/+
all with dismal results, such as $deg = "48?", $min = "?", $sec = "25′43″ N" and $dir = undef
. I've encapsulated the code inside braces {}
and included within that scope use utf8; and use feature 'unicode_strings'; all with nada results.
input data example:
my $Lat = "48° 25′ 43″ N";
Expected output:
$deg = 48, $min = 25, $sec = 43 and $dir = "N"
回答1:
You may try this regex to split the string:
[^\dNSEW.]+
Regex Demo
Sample source: ( run here )
my $str = '48° 25′ 43″ N';
my $regex = qr/[^\dNSEW.]+/p;
my ($deg, $min, $sec, $dir) = split $regex, $str;
回答2:
My bad! Pilot error!
The original regex I posted, and was struggling with was:
/[\x{0B00}\x{2032}\x{2033}\s]/+
The error(s) are where I placed the '+' character and the hex value of the degree character. That regex should have been written:
/[\x{B0}\x{2032}\x{2033}\s]+/
The answer from @Rizwan was illuminating but I was determined to make regular expressions in Perl work with Unicode, so I persevered, and now this is my solution:
use utf8;
no warnings;
my $dms = "48° 25′ 43.314560″ N";
my $regex = qr/[\x{B0}\x{2032}\x{2033}:\s]+/p; # some geodesics do use ':'
my ($deg, $min, $sec, $dir) = split $regex, $dms;
printf("\$deg: %s, \$min: %s, \$sec: %s, \$dir: %s\n",
$deg, $min, $sec, $dir);
Like it or not, Unicode is the future.
来源:https://stackoverflow.com/questions/48534863/splitting-a-string-containing-a-longitude-or-latitude-expression-in-perl