I have an array of bytes (any length), and I want to encode this array into string using my own base encoder. In .NET
is standard Base64
encoder, but w
A little late to the party, but...
Because your specification calls for an arbitrary number of bits, you must have an integer type that can work with an arbitrary number of bits. If you can't target .NET 4.0 you'll have to beg, borrow, or steal a BigInteger implementation somewhere (like .NET 4.0 perhaps).
public static class GenericBaseConverter
{
public static string ConvertToString(byte[] valueAsArray, string digits, int pad)
{
if (digits == null)
throw new ArgumentNullException("digits");
if (digits.Length < 2)
throw new ArgumentOutOfRangeException("digits", "Expected string with at least two digits");
BigInteger value = new BigInteger(valueAsArray);
bool isNeg = value < 0;
value = isNeg ? -value : value;
StringBuilder sb = new StringBuilder(pad + (isNeg ? 1 : 0));
do
{
BigInteger rem;
value = BigInteger.DivRem(value, digits.Length, out rem);
sb.Append(digits[(int)rem]);
} while (value > 0);
// pad it
if (sb.Length < pad)
sb.Append(digits[0], pad - sb.Length);
// if the number is negative, add the sign.
if (isNeg)
sb.Append('-');
// reverse it
for (int i = 0, j = sb.Length - 1; i < j; i++, j--)
{
char t = sb[i];
sb[i] = sb[j];
sb[j] = t;
}
return sb.ToString();
}
public static BigInteger ConvertFromString(string s, string digits)
{
BigInteger result;
switch (Parse(s, digits, out result))
{
case ParseCode.FormatError:
throw new FormatException("Input string was not in the correct format.");
case ParseCode.NullString:
throw new ArgumentNullException("s");
case ParseCode.NullDigits:
throw new ArgumentNullException("digits");
case ParseCode.InsufficientDigits:
throw new ArgumentOutOfRangeException("digits", "Expected string with at least two digits");
case ParseCode.Overflow:
throw new OverflowException();
}
return result;
}
public static bool TryConvertFromString(string s, string digits, out BigInteger result)
{
return Parse(s, digits, out result) == ParseCode.Success;
}
private enum ParseCode
{
Success,
NullString,
NullDigits,
InsufficientDigits,
Overflow,
FormatError,
}
private static ParseCode Parse(string s, string digits, out BigInteger result)
{
result = 0;
if (s == null)
return ParseCode.NullString;
if (digits == null)
return ParseCode.NullDigits;
if (digits.Length < 2)
return ParseCode.InsufficientDigits;
// skip leading white space
int i = 0;
while (i < s.Length && Char.IsWhiteSpace(s[i]))
++i;
if (i >= s.Length)
return ParseCode.FormatError;
// get the sign if it's there.
BigInteger sign = 1;
if (s[i] == '+')
++i;
else if (s[i] == '-')
{
++i;
sign = -1;
}
// Make sure there's at least one digit
if (i >= s.Length)
return ParseCode.FormatError;
// Parse the digits.
while (i < s.Length)
{
int n = digits.IndexOf(s[i]);
if (n < 0)
return ParseCode.FormatError;
BigInteger oldResult = result;
result = unchecked((result * digits.Length) + n);
if (result < oldResult)
return ParseCode.Overflow;
++i;
}
// skip trailing white space
while (i < s.Length && Char.IsWhiteSpace(s[i]))
++i;
// and make sure there's nothing else.
if (i < s.Length)
return ParseCode.FormatError;
if (sign < 0)
result = -result;
return ParseCode.Success;
}
}
Here is a copy from my blog which I hope helps how (and why) I convert to Base62
I am currently working on my own url shortener: konv.es. In order to create the shortest possible character hash of the url, I use the GetHashCode() method of the string, then convert the resulting number to base 62 ([0-9a-zA-Z]). The most elegant solution that I have found thus far to make the convertion (which is also a handy-dandy example of a yield return) is:
public static IEnumerable<char> ToBase62(int number)
{
do
{
yield return "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"[number % 62];
number /= 62;
} while (number > 0);
}
Extra credit: re-factor as an extension method
I've written an article which describes a solution in Python that exactly deals with your problem. I didn't use very special features of Python in order to get a solution which can easily be implemented in other languages. You might have a look and find out if it fits your needs.
A post on CodeReview prompted me to create a RadixEncoding class which is able to handle encoding/decoding a byte array to/from a base-N string.
The class can be found in this Q&A thread, along with documentation on (and solutions for) a few edge cases when dealing with BigInteger, endian-ness support, and the class' overall performance
BASE64 works well, because 64 is a power of 2 (2^6) so each character holds 6 bits of data, and 3 bytes (3 * 8 = 24 bits) can be encoded into 4 characters (4 * 6 = 24). The encoding & decoding can be down merely bit shifting bits.
For bases which do not align with a power of 2 (like your base 62 or Base 53), Then you must treat the message you are trying to encode as one long number and perform divison and modulo operations on it. You'd probably be better off using a Base32 encoding and squandering a bit of bandwidth.
You can get inspiration from C# implementation of Base32 implementation by Michael Giagnocavo.