inconsistency in converting string to integer, when string is hex, prefixed with '0x'

泄露秘密 提交于 2019-12-09 01:00:09

问题


Using PHP 5.3.5. Not sure how this works on other versions.

I'm confused about using strings that hold numbers, e.g., '0x4B0' or '1.2e3'. The way how PHP works with such strings seems inconsistent to me. Is it only me? Or is it a bug? Or undocumented feature? Or am I just missing some magic sentence in docs?

<?php

echo $str = '0x4B0', PHP_EOL;
echo "is_numeric() -> ", var_dump(is_numeric($str)); // bool(true)
echo "*1           -> ", var_dump($str * 1);         // int(1200)
echo "(int)        -> ", var_dump((int)$str);        // int(0)
echo "(float)      -> ", var_dump((float)$str);      // float(0)
echo PHP_EOL;

echo $str = '1.2e3', PHP_EOL;
echo "is_numeric() -> ", var_dump(is_numeric($str)); // bool(true)
echo "*1           -> ", var_dump($str * 1);         // float(1200)
echo "(int)        -> ", var_dump((int)$str);        // int(1)
echo "(float)      -> ", var_dump((float)$str);      // float(1200)
echo PHP_EOL;

In both cases, is_numeric() returns true. Also, in both cases, $str * 1 parses string and returns valid number (integer in one case, float in another case).

Casting with (int)$str and (float)$str gives unexpected results.

  • (int)$str in any case is able to parse only digits, with optional "+" or "-" in front of them.
  • (float)$str is more advanced and can parse something like ^[+-]?\d*(\.\d*)?(e[+-]?\d*)?, i.e., optional "+" or "-", followed by optional digits, followed by optional decimal point with optional digits, followed by optional exponent which consists of "e" with optional "+" or "-" followed by optional digits. Fails on hex data though.

Related docs:

  • is_numeric() - states that "Hexadecimal notation (0xFF) is allowed too but only without sign, decimal and exponential part". If function, meant to test if a string holds numeric data, returns true, I expect PHP to be able to convert such string to a number. This seems to work with $str * 1, but not with casting. Why?
  • Converting to integer - states that "in most cases the cast is not needed, since a value will be automatically converted if an operator, function or control structure requires an integer argument". After such statement, I expect both $s * 10 and (int)$s * 10 expressions to work the same way and to return the same result. Though, as shown in example, those expressions are evaluated differently.
  • String conversion to numbers - states that "Valid numeric data is an optional sign, followed by one or more digits (optionally containing a decimal point), followed by an optional exponent". "Exponent" is "e" or "E", followed by digits, e.g., 1.2e3 is valid numeric data. Sign ("+" or "-") is not mentioned. It does not mention hexidecimal values. This conflicts with definition of "numeric data" used in is_numeric(). Then, there is suggestion "For more information on this conversion, see the Unix manual page for strtod(3)", and man strtod describes additional numeric values (including HEX notation). So, after reading this, is hexidecimal data supposed to be valid or invalid numeric data?

So...

  • Is there (or, rather, should there be) any relation between is_numeric() and the way how PHP treats strings when they are used as numbers?
  • Why do (int)$s, (float)$s and $s * 1 work differently, i.e,. give completely different results, when $s is 0x4B0 or 1.2e3?
  • Is there any way to convert a string to a number and keep its value, if it is written as 0x4B0 or as 1.2e3? floatval() does not work with HEX at all, intval() needs $base to be set to 16 to work with HEX, typecasting with (int)$str and (float)$str sometimes works, sometimes does not work, so these are not valid options. I'm also not considering $n *= 1;, as it looks more like data manipulation rather than converting. Self-written functions also are not considered in this case, as I'm looking for native solution.

回答1:


The direct casts (int)$str and (float)$str don't really work differently at all: They both read as many characters from the string as they can interpret as a number of the respective type.

For "0x4B0", the int-conversion reads "0" (OK), then "x" and stops, because it cannot convert "x" into an integer. Likewise for the float-conversion.

For "1.2e3", the int-conversion reads "1" (OK), then "." and stops. The float-conversion recognises the entire string as valid float notation.

The automatic type recognition for an expression like $str * 1 is simply more flexible than the explicit casts. The explicit casts require the integers and floats to be in the format produced by %i and %f in printf, essentially.

Perhaps you can use intval and floatval rather than explicit casts-to-int for more flexibility, though.

Finally, your question "is hexidecimal data supposed to be valid or invalid numeric data?" is awkward. There is no such thing as "hexadecimal data". Hexadecimal is just a number base. What you can do is take a string like "4B0" and use strtoul etc. to parse it as an integer in any number base between 2 and 36.[Sorry, that was BS. There's no strtoul in PHP. But intval has the equivalent functionality, see above.]




回答2:


intval uses strtol which recognizes oct/hex prefixes when the base parameter is zero, so

var_dump(intval('0xef'));     // int(0)
var_dump(intval('0xff', 0));  // int(255)



回答3:


Is there (or, rather, should there be) any relation between is_numeric() and the way how PHP treats strings when they are used as numbers?

There is no datatype called numeric in PHP, the is_numeric() function is more of a test for something that can be interpreted as number by PHP.

As far as such number interpreting is concerned, adding a + in front of the value will actually make PHP to convert it into a number:

$int = +'0x4B0';
$float = +'1.2e3';

You find this explained in the manual for string, look for the section String conversion to numbers.

As it's triggered by an operator, I don't see any need why there should be a function in PHP that does the same. That would be superfluous.


Internally PHP uses a function called zendi_convert_scalar_to_number for the add operator (assumable +) that will make use of is_numeric_string to obtain the number.

The exact same function is called internally by is_numeric() when used with strings.

So to trigger the native conversion function, I would just use the + operator. This will ensure that you'll get back the numeric pseudo-type (int or float).

Ref: /Zend/zend_operators.c; /ext/standard/type.c



来源:https://stackoverflow.com/questions/6438864/inconsistency-in-converting-string-to-integer-when-string-is-hex-prefixed-with

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!