Code golf - hex to (raw) binary conversion

拟墨画扇 提交于 2019-11-28 06:33:42
Brian Campbell

edit Checkers has reduced my C solution to 46 bytes, which was then reduced to 44 bytes thanks to a tip from BillyONeal plus a bugfix on my part (no more infinite loop on bad input, now it just terminates the loop). Please give credit to Checkers for reducing this from 77 to 46 bytes:

main(i){while(scanf("%2x",&i)>0)putchar(i);}

And I have a much better Ruby solution than my last, in 42 38 bytes (thanks to Joshua Swank for the regexp suggestion):

STDIN.read.scan(/\S\S/){|x|putc x.hex}

original solutions

C, in 77 bytes, or two lines of code (would be 1 if you could put the #include on the same line). Note that this has an infinite loop on bad input; the 44 byte solution with the help of Checkers and BillyONeal fixes the bug, and simply stops on bad input.

#include <stdio.h>
int main(){char c;while(scanf("%2x",&c)!=EOF)putchar(c);}

It's even just 6 lines if you format it normally:

#include <stdio.h>
int main() {
  char c;
  while (scanf("%2x",&c) != EOF)
    putchar(c);
}

Ruby, 79 bytes (I'm sure this can be improved):

STDOUT.write STDIN.read.scan(/[^\s]\s*[^\s]\s*/).map{|x|x.to_i(16)}.pack("c*")

These both take input from STDIN and write to STDOUT

Hasturkun

39 char perl oneliner

y/A-Fa-f0-9//dc,print pack"H*",$_ for<>

Edit: wasn't really accepting uppercase, fixed.

Skizz

45 byte executable (base64 encoded):

6BQAitjoDwDA4AQI2LQCitDNIevrWMOy/7QGzSF09jLkBMAa5YDkByrEJA/D

(paste into a file with a .com extension)

EDIT: Ok, here's the code. Open a Window's console, create a file with 45 bytes called 'hex.com', type "debug hex.com" then 'a' and enter. Copy and paste these lines:

db e8,14,00,8a,d8,e8,0f,00,c0,e0,04,08,d8,b4,02,8a,d0,cd,21,eb,eb,cd,20
db b2,ff,b4,06,cd,21,74,f6,32,e4,04,c0,1a,e5,80,e4,07,2a,c4,24,0f,c3

Press enter, 'w' and then enter again, 'q' and enter. You can now run 'hex.com'

EDIT2: Made it two bytes smaller!

db e8, 11, 00, 8a, d8, e8, 0c, 00, b4, 02, 02, c0, 67, 8d, 14, c3
db cd, 21, eb, ec, ba, ff, 00, b4, 06, cd, 21, 74, 0c, 04, c0, 18
db ee, 80, e6, 07, 28, f0, 24, 0f, c3, cd, 20

That was tricky. I can't believe I spent time doing that.

Alex B

Brian's 77-byte C solution can be improved to 44 bytes, thanks to leniency of C with regard to function prototypes.

main(i){while(scanf("%2x",&i)>0)putchar(i);}

In Python:

binary = binascii.unhexlify(hex_str)

ONE LINE! (Yes, this is cheating.)

EDIT: This code was written a long time before the question edit which fleshed out the requirements.

Given that a single line of C can contain a huge number of statements, it's almost certainly true without being useful.

In C# I'd almost certainly write it in more than 10 lines, even though it would be feasible in 10. I'd separate out the "parse nybble" part from the "convert a string to a byte array" part.

Of course, if you don't care about spotting incorrect lengths etc, it becomes a bit easier. Your original text also contained spaces - should those be skipped, validated, etc? Are they part of the required input format?

I rather suspect that the comment was made without consideration as to what a pleasant, readable solution would look like.

Having said that, here's a hideous version in C#. For bonus points, it uses LINQ completely inappropriately in an effort to save a line or two of code. The lines could be longer, of course...

using System;
using System.Linq;

public class Test
{
    static void Main(string[] args)
    {
        byte[] data = ParseHex(args[0]);
        Console.WriteLine(BitConverter.ToString(data));

    }

    static byte[] ParseHex(string text)
    {
        Func<char, int> parseNybble = c => (c >= '0' && c <= '9') ? c-'0' : char.ToLower(c)-'a'+10;
        return Enumerable.Range(0, text.Length/2)
            .Select(x => (byte) ((parseNybble(text[x*2]) << 4) | parseNybble(text[x*2+1])))
            .ToArray();
    }
}

(This is avoiding "cheating" by using any built-in hex parsing code, such as Convert.ToByte(string, 16). Aside from anything else, that would mean losing the use of the word nybble, which is always a bonus.)

Perl

In, of course, one (fairly short) line:

my $bin = map { chr hex } ($hex =~ /\G([0-9a-fA-F]{2})/g);

Haskell:

import Data.Char
import Numeric
import System.IO
import Foreign

main = hGetContents stdin >>= 
       return.fromHexStr.filter (not.isSpace) >>=  
       mapM_ (writeOneByte stdout)

fromHexStr (a:b:tl) = fromHexDgt [a,b]:fromHexStr tl
fromHexStr [] = []
fromHexDgt str =  case readHex str of 
  [(i,"")] -> fromIntegral (i)
  s -> error$show s

writeOneByte h i = allocaBytes 1 (wob' h i)
wob' :: Handle -> Int8 -> (Ptr Int8) -> IO ()
wob' h i ptr = poke ptr i >> hPutBuf h ptr 1
Adam Davis

Gah.

You aren't allowed to call me on my off-the-cuff estimates! ;-P

Here's a 9 line C version with no odd formatting (Well, I'll grant you that the tohex array would be better split into 16 lines so you can see which character codes map to which values...), and only 2 shortcuts that I wouldn't deploy in anything other than a one-off script:

#include <stdio.h>
char hextonum[256] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 0, 0, 0, 0, 0, 0,10,11,12,13,14,15, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,10,11,12,13,14,15, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
char input[81]="8b1f0008023149f60300f1f375f40c72f77508507676720c560d75f002e5ce000861130200000000";
void main(void){
   int i = 0;
   FILE *fd = fopen("outfile.bin", "wb");
   while((input[i] != 0) && (input[i+1] != 0))
      fputc(hextonum[input[i++]] * 16 + hextonum[input[i++]], fd);
}

No combined lines (each statement is given its own line), it's perfectly readable, etc. An obfuscated version could undoubtedly be shorter, one could cheat and put the close braces on the same line as the preceding statement, etc, etc, etc.

The two things I don't like about it is that I don't have a close(fd) in there, and main shouldn't be void and should return an int. Arguably they're not needed - the OS will release every resource the program used, the file will close without any problems, and the compiler will take care of the program exit value. Given that it's a one-time use script, it's acceptable, but don't deploy this.

It becomes eleven lines with both, so it's not a huge increase anyway, and a ten line version would include one or the other depending on which one might feel is the lessor of two evils.

It doesn't do any error checking, and it doesn't allow whitespace - assuming, again, that it's a one time program then it's faster to do search/replace and get rid of spaces and other whitespace before running the script, however it shouldn't need more than another few lines to eat whitespace as well.

There are, of course, ways to make it shorter but they would likely decrease readability significantly...

Hmph. Just read the comment about line length, so here's a newer version with an uglier hextonum macro, rather than the array:

#include <stdio.h>
#define hextonum(x) (((x)<'A')?((x)-'0'):(((x)<'a')?((x)+10-'A'):((x)+10-'a')))
char input[81]="8b1f0008023149f60300f1f375f40c72f77508507676720c560d75f002e5ce000861130200000000";
void main(void){
   int i = 0;
   FILE *fd = fopen("outfile.bin", "wb");
   for(i=0;(input[i] != 0) && (input[i+1] != 0);i+=2)
      fputc(hextonum(input[i]) * 16 + hextonum(input[i+1]), fd);
}

It isn't horribly unreadable, but I know many people have issues with the ternary operator, but the appropriate naming of the macro and some analysis should readily yield how it works to the average C programmer. Due to side effects in the macro I had to move to a for loop so I didn't have to have another line for i+=2 (hextonum(i++) will increment i by 5 each time it's called, macro side effects are not for the faint of heart!).

Also, the input parser should skip/ignore white space.

grumble, grumble, grumble.

I had to add a few lines to take care of this requirement, now up to 14 lines for a reasonably formatted version. It will ignore everything that's not a hexadecimal character:

#include <stdio.h>
int hextonum[] = {-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,0,1,2,3,4,5,6,7,8,9,-1,-1,-1,-1,-1,-1,-1,10,11,12,13,14,15,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,10,11,12,13,14,15,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1};
char input[]="8b1f 0008 0231 49f6 0300 f1f3 75f4 0c72 f775 0850 7676 720c 560d 75f0 02e5 ce00 0861 1302 0000 0000";
void main(void){
   unsigned char i = 0, nibble = 1, byte = 0;
   FILE *fd = fopen("outfile.bin", "wb");
   for(i=0;input[i] != 0;i++){
      if(hextonum[input[i]] == -1)
         continue;
      byte = (byte << 4) + hextonum[input[i]];
      if((nibble ^= 0x01) == 0x01)
         fputc(byte, fd);
   }
}

I didn't bother with the 80 character line length because the input isn't even less than 80 characters, but a 3 level ternary macro could replace the first 256 entry array. If one didn't mind a bit of "alternative formatting" then the following 10 line version isn't completely unreadable:

#include <stdio.h>
int hextonum[] = {-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,0,1,2,3,4,5,6,7,8,9,-1,-1,-1,-1,-1,-1,-1,10,11,12,13,14,15,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,10,11,12,13,14,15,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1};
char input[]="8b1f 0008 0231 49f6 0300 f1f3 75f4 0c72 f775 0850 7676 720c 560d 75f0 02e5 ce00 0861 1302 0000 0000";
void main(void){
   unsigned char i = 0, nibble = 1, byte = 0;
   FILE *fd = fopen("outfile.bin", "wb");
   for(i=0;input[i] != 0;i++){
      if(hextonum[input[i]] == -1) continue;
      byte = (byte << 4) + hextonum[input[i]];
      if((nibble ^= 0x01) == 0x01) fputc(byte, fd);}}

And, again, further obfuscation and bit twiddling could result in an even shorter example.

.

Its an language called "Hex!". Its only usage is to read hex data from stdin and output it to stdout. Hex! is parsed by an simple python script. import sys

try:
  data = open(sys.argv[1], 'r').read()
except IndexError:
  data = raw_input("hex!> ")
except Exception as e:
  print "Error occurred:",e

if data == ".":
  hex = raw_input()
  print int(hex, 16)
else:
  print "parsing error"
strager

Fairly readably C solution (9 "real" lines):

#include <stdio.h>
int getNextHexDigit() {
    int v;
    while((v = fgetc(stdin)) < '0' && v != -1) {    /* Until non-whitespace or EOF */
    }
    return v > '9' ? 9 + (v & 0x0F) : v - '0';      /* Extract number from hex digit (ASCII) */
}
int main() {
    int v;
    fputc(v = (getNextHexDigit() << 4) | getNextHexDigit(), stdout);
    return v > 0 ? main(0) : 0;
}

To support 16-bit little endian goodness, replace main with:

int main() {
    int v, q;
    v = (getNextHexDigit() << 4) | getNextHexDigit();
    fputc(q = (getNextHexDigit() << 4) | getNextHexDigit(), stdout);
    fputc(v, stdout);
    return (v | q) > 0 ? main(0) : 0;
}
Igor Krivokon

A 31-character Perl solution:

s/\W//g,print(pack'H*',$_)for<>

I can't code this off the top of my head, but for every two characters, output (byte)((AsciiValueChar1-(AsciiValueChar1>64?48:55)*16)+(AsciiValueChar1-(AsciiValueChar1>64?48:55))) to get a hex string changed into raw binary. This would break horribly if your input string has anything other than 0 to 9 or A to B, so I can't say how useful it would be to you.

Mikko Rantanen

I know Jon posted a (cleaner) LINQ solution already. But for once I am able to use a LINQ statement which modifies a string during its execution and abuses LINQ's deferred evaluation without getting yelled at by my co-workers. :p

string hex = "FFA042";
byte[] bytes =
    hex.ToCharArray()
       .Select(c => ('0' <= c && c <= '9') ? 
                         c - '0' :
                         10 + (('a' <= c) ? c - 'a' : c - 'A'))
       .Select(c => (hex = hex.Remove(0, 1)).Length > 0 ? (new int[] {
           c,
           hex.ToCharArray()
                 .Select(c2 => ('0' <= c2 && c2 <= '9') ?
                                    c2 - '0' :
                                    10 + (('a' <= c2) ? c2 - 'a' : c2 - 'A'))
                 .FirstOrDefault() }) : ( new int[] { c } ) )
       .Where(c => (hex.Length % 2) == 1)
       .Select(ca => ((byte)((ca[0] << 4) + ca[1]))).ToArray();

1 statement formatted for readability.

Update

Support for spaces and uneven amount of decimals (89A is equal to 08 9A)

byte[] bytes =
    hex.ToCharArray()
       .Where(c => c != ' ')
       .Reverse()
       .Select(c => (char)(c2 | 32) % 39 - 9)
       .Select(c => 
           (hex =
                new string('0', 
                           (2 + (hex.Replace(" ", "").Length % 2)) *
                                hex.Replace(" ", "")[0].CompareTo('0')
                                                       .CompareTo(0)) +
                hex.Replace(" ", "").Remove(hex.Replace(" ", "").Length - 1))
              .Length > 0 ? (new int[] {
                        hex.ToCharArray()
                           .Reverse()
                           .Select(c2 => (char)(c2 | 32) % 39 - 9)
                           .FirstOrDefault(), c }) : new int[] { 0, c } )
                     .Where(c => (hex.Length % 2) == 1)
                     .Select(ca => ((byte)((ca[0] << 4) + ca[1])))
                     .Reverse().ToArray();

Still one statement. Could be made much shorter by running the replace(" ", "") on hex string in the start, but this would be a second statement.

Two interesting points with this one. How to track the character count without the help of outside variables other than the source string itself. While solving this I encountered the fact that char y.CompareTo(x) just returns "y - x" while int y.CompareTo(x) returns -1, 0 or 1. So char y.CompareTo(x).CompareTo(0) equals a char comparison which returns -1, 0 or 1.

Kuroki Kaze

PHP, 28 symbols:

<?=pack(I,hexdec($argv[1]));
Lloeki

Late to the game, but here's some Python{2,3} one-liner (100 chars, needs import sys, re):

sys.stdout.write(''.join([chr(int(x,16)) for x in re.findall(r'[A-Fa-f0-9]{2}', sys.stdin.read())]))
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!