Create Binary files in UNIX

后端 未结 5 1878
-上瘾入骨i
-上瘾入骨i 2021-02-06 00:39

This question was out there for a while and I thought I should offer some bonus points if I can get it to work.

What did I do…

Recently at work, I wrote a par

相关标签:
5条回答
  • 2021-02-06 01:28

    You can use xxd to convert to and from binary files / hexdumps quite simply.

    data to hex

    echo  Hello | xxd -p 
    48656c6c6f0a
    

    hex to data

    echo 48656c6c6f0a | xxd -r -p
    Hello
    

    or

    echo 48 65 6c 6c 6f 0a | xxd -r -p
    Hello
    

    The -p is postscript mode which allows for a more freeform input

    This is the output from xxd -r -p text where text is the data you give above

    ==▒sGTP▒▒U<▒I▒▒▒΁/▒▒3▒▒▒▒▒▒▒▒▒bTY`84▒
                                         Xbp`▒▒▒▒▒▒▒|▒L▒@@(▒▒U8▒+#POC01
    :▒ިv▒b▒▒▒▒TY`84Ud▒▒▒▒>▒▒▒▒▒▒▒!▒
    blackberrynet▒/▒▒!
    M
    ▒▒!
    N
    ▒▒#Oripassword▒▒΁/▒▒΁/▒▒Xbp`▒@@(▒▒U8▒IvPOC01
    :qU▒b▒▒▒▒▒▒TY`84U▒▒▒*:▒▒!
    ▒k▒▒▒#O Welcmme!
    ▒!
    M
    
    0 讨论(0)
  • 2021-02-06 01:28

    There's a tool binmake allowing to describe in text format some binary data and generate a binary file (or output to stdout). It allows to change the endianess and number formats and accepts comments.

    First get and compile binmake (the binary program will be in bin/):

    $ git clone https://github.com/dadadel/binmake
    $ cd binmake
    $ make
    

    Create your text file file.txt:

    # an exemple of file description of binary data to generate
    # set endianess to big-endian
    big-endian
    
    # default number is hexadecimal
    00112233
    
    # man can explicit a number type: %b means binary number
    %b0100110111100000
    
    # change endianess to little-endian
    little-endian
    
    # if no explicit, use default
    44556677
    
    # bytes are not concerned by endianess
    88 99 aa bb
    
    # change default to decimal
    decimal
    
    # following number is now decimal
    0123
    
    # strings are delimited by " or '
    "this is some raw string"
    
    # explicit hexa number starts with %x
    %xff
    

    Generate your binary file file.bin:

    $ ./binmake file.txt file.bin
    $ hexdump file.bin -C
    00000000  00 11 22 33 4d e0 77 66  55 44 88 99 aa bb 7b 74  |.."3M.wfUD....{t|
    00000010  68 69 73 20 69 73 20 73  6f 6d 65 20 72 61 77 20  |his is some raw |
    00000020  73 74 72 69 6e 67 ff                              |string.|
    00000027
    

    You can also pipe it using stdin and stdout:

    $ echo '32 decimal 32 %x61 61' | ./binmake | hexdump -C
    00000000  32 20 61 3d                                       |2 a=|
    00000004
    
    0 讨论(0)
  • 2021-02-06 01:31

    Using cut and awk, you can do it fairly simply using a gawk (GNU Awk) extension function, strtonum():

    cut -c11-60 inputfile |
    awk '{ for (i = 1; i <= NF; i++)
           {
               c = strtonum("0x" $i)
               printf("%c", c);
           }
         }' > outputfile
    

    Or, if you are using a non-GNU version of 'new awk', then you can use:

    cut -c11-60 inputfile |
    awk '{  for (i = 1; i <= NF; i++)
            {
                s = toupper($i)
                c0 = index("0123456789ABCDEF", substr(s, 1, 1)) - 1
                c1 = index("0123456789ABCDEF", substr(s, 2, 1)) - 1
                printf("%c", c0*16 + c1);
            }
         }' > outputfile
    

    If you want to use other tools (Perl and Python sprint to mind; Ruby would be another possibility), you can do it easily enough.

    odx is a program similar to the hexdump program. The script above was modified to read 'hexdump.out' as the input file, and the output piped into odx instead of a file, and gives the following output:

    $ cat hexdump.out
    00000000  3d 3d 01 fc 73 47 54 50  02 f1 d6 55 3c 9f 49 9c  |==..sGTP...U<.I.|
    00000010  00 01 01 00 01 80 00 dc  ce 81 2f 00 00 00 00 00  |........../.....|
    00000020  00 00 00 00 00 00 00 00  ca 04 d2 33 00 00 00 00  |...........3....|
    00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 10  |................|
    00000040  01 01 0f 00 00 07 04 ea  00 00 ff ff 00 00 14 b7  |................|
    00000050  00 00 ff ff 00 00 83 ec  00 00 83 62 54 14 59 00  |...........bT.Y.|
    00000060  60 38 34 f5 01 01 0b 58  62 70 11 60 f6 ff ff ff  |`84....Xbp.`....|
    00000070  ff ff ff 02 00 7c 00 d0  01 4c 00 b0 40 40 28 02  |.....|...L..@@(.|
    $ sh -x revdump.sh | odx
    + cut -c11-60 hexdump.out
    + awk '{  for (i = 1; i <= NF; i++)
            {
                #c = strtonum("0x" $i)
                #printf("%c", c);
                s = toupper($i)
                c0 = index("0123456789ABCDEF", substr(s, 1, 1)) - 1
                c1 = index("0123456789ABCDEF", substr(s, 2, 1)) - 1
                printf("%c", c0*16 + c1);
            }
         }'
    0x0000: 3D 3D 01 FC 73 47 54 50 02 F1 D6 55 3C 9F 49 9C   ==..sGTP...U<.I.
    0x0010: 00 01 01 00 01 80 00 DC CE 81 2F 00 00 00 00 00   ........../.....
    0x0020: 00 00 00 00 00 00 00 00 CA 04 D2 33 00 00 00 00   ...........3....
    0x0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10   ................
    0x0040: 01 01 0F 00 00 07 04 EA 00 00 FF FF 00 00 14 B7   ................
    0x0050: 00 00 FF FF 00 00 83 EC 00 00 83 62 54 14 59 00   ...........bT.Y.
    0x0060: 60 38 34 F5 01 01 0B 58 62 70 11 60 F6 FF FF FF   `84....Xbp.`....
    0x0070: FF FF FF 02 00 7C 00 D0 01 4C 00 B0 40 40 28 02   .....|...L..@@(.
    0x0080:
    $ 
    

    Or, using hexdump -C in place of odx:

    $ sh -x revdump.sh | hexdump -C
    + cut -c11-60 hexdump.out
    + awk '{  for (i = 1; i <= NF; i++)
            {
                #c = strtonum("0x" $i)
                #printf("%c", c);
                s = toupper($i)
                c0 = index("0123456789ABCDEF", substr(s, 1, 1)) - 1
                c1 = index("0123456789ABCDEF", substr(s, 2, 1)) - 1
                printf("%c", c0*16 + c1);
            }
         }'
    00000000  3d 3d 01 fc 73 47 54 50  02 f1 d6 55 3c 9f 49 9c  |==..sGTP...U<.I.|
    00000010  00 01 01 00 01 80 00 dc  ce 81 2f 00 00 00 00 00  |........../.....|
    00000020  00 00 00 00 00 00 00 00  ca 04 d2 33 00 00 00 00  |...........3....|
    00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 10  |................|
    00000040  01 01 0f 00 00 07 04 ea  00 00 ff ff 00 00 14 b7  |................|
    00000050  00 00 ff ff 00 00 83 ec  00 00 83 62 54 14 59 00  |...........bT.Y.|
    00000060  60 38 34 f5 01 01 0b 58  62 70 11 60 f6 ff ff ff  |`84....Xbp.`....|
    00000070  ff ff ff 02 00 7c 00 d0  01 4c 00 b0 40 40 28 02  |.....|...L..@@(.|
    00000080
    $
    
    0 讨论(0)
  • 2021-02-06 01:31

    To change encoding from File3 to File1, you use a script like this:

    #!/bin/bash
    
    # file name: tobin.sh
    
    fileName="tobin.txt"   # todo: pass it as parameter
                           #       or prepare it to be used via the pipe...
    while read line; do
      for hexValue in $line; do
        echo -n -e "\x$hexValue"
      done
    done < $fileName
    

    Or, if you just want to pipe it, and use like the xxd example in this thread:

    #!/bin/bash
    
    # file name: tobin.sh
    # usage: cat file3.txt | ./tobin.sh > file1.bin
    
    while read line; do
      for hexValue in $line; do
        echo -n -e "\x$hexValue"
      done
    done
    

    If you really want to use BASH for this, then I suggest you start using array to nicely build your packet. Here is starting code:

    #!/bin/sh
    
    # We assume the script will run on a LSB architecture.
    
    hexDump() {
      for idx in $(seq 0 ${#buffer[@]}); do
        printf "%02X", ${buffer[$idx]}
      done
    } # hexDump() function
    
    ###
    # dump() dumps the current content of the buffer[] array to the STDOUT.
    #
    dump() {
      # or, use $ptr here...
      for idx in $(seq 0 ${#buffer[@]}); do
        printf "%c" ${buffer[$idx]}
      done
    } # dump() function
    
    # Beginning of DB Package Identifier: ==
    buffer[0]=$'\x3d' # =
    buffer[1]=$'\x3d' # =
    size=2
    
    # Total Package Length: 2
    # We start with 2, and later on we update it once we know the exact size...
    # Assuming 32bit architecture, LSB, this is how we encode number 2 (that is our current size of the packet)
    buffer[2]=$'\x02'
    buffer[3]=$'\x00'
    buffer[4]=$'\x00'
    buffer[5]=$'\x00'
    
    # Offset to Data Record Count field: 115
    # I assume this is also a 32bit field of unsigned int type
    ptr=5
    buffer[++ptr]=$'\x73'  # 115
    buffer[++ptr]=$'\x00'
    buffer[++ptr]=$'\x00'
    buffer[++ptr]=$'\x00'
    
    #hexDump
    dump
    

    Output:

    $ ./tobin2.sh | hexdump -C
    00000000  3d 3d 02 00 00 00 73 00  00 00 00                 |==....s....|
    0000000b
    

    Sure, this is not solution the the original post... The solution will use something like this to generate binary output. The biggest problem is that we still do not know the types of fields in the packet. We also do not know the architecture (is it bigendian, or littleendian, is it 32bit, or 64bit). You must give us the specification. For an instance, the lenght of the package is of what type? We do not know that from that TXT file!

    In order to help you do what you have to do, you must find us the specification about sizes of those fields.

    Note it is a good start though. You need to implement convenience functions to, for an example, automatically fill the buffer[] with values from a string encoded with hex values. So you can do something like write $offset "ff c0 d3 ba be".

    0 讨论(0)
  • 2021-02-06 01:35

    awk is the wrong tool for the job here, but there are a thousand ways to do it. The easiest way is often a small C program, or any other language that explicitely makes a distinction between a character and a string of decimal digits.

    However, to do it in awk, use the "%c" printf format.

    0 讨论(0)
提交回复
热议问题