Manipulating binary data in Python

前端 未结 7 1339
旧时难觅i
旧时难觅i 2021-02-07 07:16

I am opening up a binary file like so:

file = open(\"test/test.x\", \'rb\')

and reading in lines to a list. Each line looks a little like:

相关标签:
7条回答
  • 2021-02-07 07:25

    Are you using read() or readline()? You should be using read(n) to read n bytes; readline() will read until it hits a newline, which the binary file might not have.

    In either case, though, you are returned a string of bytes, which may be printable or non-printable characters, and is probably not very useful.

    What you want is ord(), which converts a one-byte string into the corresponding integer value. read() from the file one byte at a time and call ord() on the result, or iterate through the entire string.

    0 讨论(0)
  • 2021-02-07 07:36

    Like theatrus mentioned, ord and hex might help you. If you want to try to interpret some sort of structured binary data in the file, the struct module might be helpful.

    0 讨论(0)
  • 2021-02-07 07:42

    You are trying to print the data converted to ASCII characters, which will not work.

    You can safely use any byte of the data. If you want to print it as a hexadecimal, look at the functions ord and hex/

    0 讨论(0)
  • 2021-02-07 07:45

    Binary data is rarely divided into "lines" separated by '\n'. If it is, it will have an implicit or explicit escape mechanism to distinguish between '\n' as a line terminator and '\n' as part of the data. Reading such a file as lines blindly without knowledge of the escape mechanism is pointless.

    To answer your specific concerns:

    '\x07' is the ASCII BEL character, which was originally for ringing the bell on a teletype machine.

    You can get the integer value of a byte 'b' by doing ord(b).

    HOWEVER, to process binary data properly, you need to know what the layout is. You can have signed and unsigned integers (of sizes 1, 2, 4, 8 bytes), floating point numbers, decimal numbers of varying lengths, fixed length strings, variable length strings, etc etc. Added complication comes from whether the data is recorded in bigendian fashion or littleendian fashion. Once you know all of the above (or have very good informed guesses), the Python struct module should be able to be used for all or most of your processing; the ctypes module may also be useful.

    Does the data format have a name? If so, tell us; we may be able to point you to code or docs.

    You ask "How do I go about using this data safely?" which begs the question: What do you want to use it for? What manipulations do you want to do?

    0 讨论(0)
  • 2021-02-07 07:50

    To print it, you can do something like this:

    print repr(data)
    

    For the whole thing as hex:

    print data.encode('hex')
    

    For the decimal value of each byte:

    print ' '.join([str(ord(a)) for a in data])
    

    To unpack binary integers, etc. from the data as if they originally came from a C-style struct, look at the struct module.

    0 讨论(0)
  • 2021-02-07 07:51

    \xhh is the character with hex value hh. Other characters such as . and `~' are normal characters.

    Iterating on a string gives you the characters in it, one at a time.

    ord(c) will return an integer representing the character. E.g., ord('A') == 65.

    This will print the decimal numbers for each character:

    s = '\xbe\x00\xc8d\xf8d\x08\xe4.\x07~\x03\x9e\x07\xbe\x03\xde\x07\xfe\n'
    print ' '.join(str(ord(c)) for c in s)
    
    0 讨论(0)
提交回复
热议问题