Base conversion of arbitrary sized numbers (PHP)

穿精又带淫゛_ 提交于 2019-12-18 17:27:15

问题


I have a long "binary string" like the output of PHPs pack function.

How can I convert this value to base62 (0-9a-zA-Z)? The built in maths functions overflow with such long inputs, and BCmath doesn't have a base_convert function, or anything that specific. I would also need a matching "pack base62" function.


回答1:


I think there is a misunderstanding behind this question. Base conversion and encoding/decoding are different. The output of base64_encode(...) is not a large base64-number. It's a series of discrete base64 values, corresponding to the compression function. That is why BC Math does not work, because BC Math is concerned with single large numbers, not strings that are in reality groups of small numbers that represent binary data.

Here's an example to illustrate the difference:

base64_encode(1234) = "MTIzNA=="
base64_convert(1234) = "TS" //if the base64_convert function existed

base64 encoding breaks the input up into groups of 3 bytes (3*8 = 24 bits), then converts each sub-segment of 6 bits (2^6 = 64, hence "base64") to the corresponding base64 character (values are "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/", where A = 0, / = 63).

In our example, base64_encode() treats "1234" as a string of 4 characters, not an integer (because base64_encode() does not operate on integers). Therefore it outputs "MTIzNA==", because (in US-ASCII/UTF-8/ISO-8859-1) "1234" is 00110001 00110010 00110011 00110100 in binary. This gets broken into 001100 (12 in decimal, character "M") 010011 (19 in decimal, character "T") 001000 ("I") 110011 ("z") 001101 ("N") 00. Since the last group isn't complete, it gets padded with 0's and the value is 000000 ("A"). Because everything is done by groups of 3 input characters, there are 2 groups: "123" and "4". The last group is padded with ='s to make it 3 chars long, so the whole output becomes "MTIzNA==".

converting to base64, on the other hand, takes a single integer value and converts it into a single base64 value. For our example, 1234 (decimal) is "TS" (base64), if we use the same string of base64 values as above. Working backward, and left-to-right: T = 19 (column 1), S = 18 (column 0), so (19 * 64^1) + (18 * 64^0) = 19 * 64 + 18 = 1234 (decimal). The same number can be represented as "4D2" in hexadecimal (base16): (4 * 16^2) + (D * 16^1) + (2 * 16^0) = (4 * 256) + (13 * 16) + (2 * 1) = 1234 (decimal).

Unlike encoding, which takes a string of characters and changes it, base conversion does not alter the actual number, just changes its presentation. The hexadecimal (base16) "FF" is the same number as decimal (base10) "255", which is the same number as "11111111" in binary (base2). Think of it like currency exchange, if the exchange rate never changed: $1 USD has the same value as £0.79 GBP (exchange rate as of today, but pretend it never changes).

In computing, integers are typically operated on as binary values (because it's easy to build 1-bit arithmetic units and then stack them together to make 32-bit/etc. arithmetic units). To do something as simple as "255 + 255" (decimal), the computer needs to first convert the numbers to binary ("11111111" + "11111111") and then perform the operation in the Arithmetic Logic Unit (ALU).

Almost all other uses of bases are purely for the convenience of humans (presentational) - computers display their internal value 11111111 (binary) as 255 (decimal) because humans are trained to operate on decimal numbers. The function base64_convert() doesn't exist as part of the standard PHP repertoire because it's not often useful to anyone: not many humans read base64 numbers natively. By contrast, binary 1's and 0's are sometimes useful for programmers (we can use them like on/off switches!), and hexadecimal is convenient for humans editing binary data because an entire 8-bit byte can be represented unambiguously as 00 through FF, without wasting too much space.

You may ask, "if base conversion is just for presentation, why does BC Math exist?" That's a fair question, and also exactly why I said "almost" purely for presentation: typical computers are limited to 32-bit or 64-bit wide numbers, which are usually plenty big enough. Sometimes you need to operate on really, really big numbers (RSA moduli for example), which don't fit in those registers. BC Math solves this problem by acting as an abstraction layer: it converts huge numbers into long strings of text. When it's time to do some operation, BC Math painstakingly breaks the long strings of text up into small chunks which the computer can handle. It's much, much slower than native operations, but it can handle arbitrary-sized numbers.




回答2:


Unless you really, really have to have base62, why not go for:

base64_encode()
base64_decode()

The only other added characters are "+" and "=", and it's a very well-known method to pack and unpack binary strings with available functions in many other languages.




回答3:


Here is a function base_conv() that can convert between completely arbitrary bases, expressed as arrays of strings; Each array element represents a single "digit" in that base, thus also allowing multi-character values (it is your responsibility to avoid ambiguity).

function base_conv($val, &$baseTo, &$baseFrom)
    {
    return base_arr_to_str(base_conv_arr(base_str_to_arr((string) $val, $baseFrom), count($baseTo), count($baseFrom)), $baseTo);
    }

function base_conv_arr($val, $baseToDigits, $baseFromDigits)
    {
    $valCount = count($val);
    $result = array();
    do
        {
        $divide = 0;
        $newlen = 0;
        for ($i = 0; $i < $valCount; ++$i)
            {
            $divide = $divide * $baseFromDigits + $val[$i];
            if ($divide >= $baseToDigits)
                {
                $val[$newlen ++] = (int) ($divide / $baseToDigits);
                $divide = $divide % $baseToDigits;
                }
            else if ($newlen > 0)
                {
                $val[$newlen ++] = 0;
                }
            }
        $valCount = $newlen;
        array_unshift($result, $divide);
        }
        while ($newlen != 0);
    return $result;
    }

function base_arr_to_str($arr, &$base)
    {
    $str = '';
    foreach ($arr as $digit)
        {
        $str .= $base[$digit];
        }
    return $str;
    }

function base_str_to_arr($str, &$base)
    {
    $arr = array();
    while ($str === '0' || !empty($str))
        {
        foreach ($base as $index => $digit)
            {
            if (mb_substr($str, 0, $digitLen = mb_strlen($digit)) === $digit)
                {
                $arr[] = $index;
                $str = mb_substr($str, $digitLen);
                continue 2;
                }
            }
        throw new Exception();
        }
    return $arr;
    }

Examples:

$baseDec = str_split('0123456789');
$baseHex = str_split('0123456789abcdef');

echo base_conv(255, $baseHex, $baseDec); // ff
echo base_conv('ff', $baseDec, $baseHex); // 255

// multi-character base:
$baseHelloworld = array('hello ', 'world ');
echo base_conv(37, $baseHelloworld, $baseDec); // world hello hello world hello world 
echo base_conv('world hello hello world hello world ', $baseDec, $baseHelloworld); // 37

// ambiguous base:
// don't do this! base_str_to_arr() won't know how to decode e.g. '11111'
// (well it does, but the result might not be what you'd expect;
// It matches digits sequentially so '11111' would be array(0, 0, 1)
// here (matched as '11', '11', '1' since they come first in the array))
$baseAmbiguous = array('11', '1', '111');


来源:https://stackoverflow.com/questions/352434/base-conversion-of-arbitrary-sized-numbers-php

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!