When would you use unpack('h*' …) or pack('h*' …)?

南楼画角 提交于 2019-11-30 08:53:30

Recall in the bad 'ole days of MS-DOS that certain OS functions were controlled by setting high nibble and low nibbles on a register and performing an Interupt xx. For example, Int 21 accessed many file functions. You would set the high nibble as the drive number -- who will have more than 15 drives?? The low nibble as the requested function on that drive, etc.

Here is some old CPAN code that uses pack as you describe to set the registers to perform an MS-DOS system call.

Blech!!! I don't miss MS-DOS at all...

--Edit

Here is specific source code: Download Perl 5.00402 for DOS HERE, unzip,

In file Opcode.pm and Opcode.pl you see the use of unpack("h*",$_[0]); here:

sub opset_to_hex ($) {
    return "(invalid opset)" unless verify_opset($_[0]);
    unpack("h*",$_[0]);
}

I did not follow the code all the way through, but my suspicion is this is to recover info from an MS-DOS system call...

In perlport for Perl 5.8-8, you have these suggested tests for endianess of the target:

Different CPUs store integers and floating point numbers in different orders (called endianness) and widths (32-bit and 64-bit being the most common today). This affects your programs when they attempt to transfer numbers in binary format from one CPU architecture to another, usually either “live” via network connection, or by storing the numbers to secondary storage such as a disk file or tape.

Conflicting storage orders make utter mess out of the numbers. If a little-endian host (Intel, VAX) stores 0x12345678 (305419896 in decimal), a big-endian host (Motorola, Sparc, PA) reads it as 0x78563412 (2018915346 in decimal). Alpha and MIPS can be either: Digital/Compaq used/uses them in little-endian mode; SGI/Cray uses them in big-endian mode. To avoid this problem in network (socket) connections use the pack and unpack formats n and N, the “network” orders. These are guaranteed to be portable.

As of perl 5.8.5, you can also use the > and < modifiers to force big- or little-endian byte-order. This is useful if you want to store signed integers or 64-bit integers, for example.

You can explore the endianness of your platform by unpacking a data structure packed in native format such as:

   print unpack("h*", pack("s2", 1, 2)), "\n";
   # '10002000' on e.g. Intel x86 or Alpha 21064 in little-endian mode
   # '00100020' on e.g. Motorola 68040

If you need to distinguish between endian architectures you could use either of the variables set like so:

   $is_big_endian    = unpack("h*", pack("s", 1)) =~ /01/;
   $is_little_endian = unpack("h*", pack("s", 1)) =~ /^1/;

Differing widths can cause truncation even between platforms of equal endianness. The platform of shorter width loses the upper parts of the number. There is no good solution for this problem except to avoid transferring or storing raw binary numbers.

One can circumnavigate both these problems in two ways. Either transfer and store numbers always in text format, instead of raw binary, or else consider using modules like Data::Dumper (included in the standard distribution as of Perl 5.005) and Storable (included as of perl 5.8). Keeping all data as text significantly simplifies matters.

The v-strings are portable only up to v2147483647 (0x7FFFFFFF), that's how far EBCDIC, or more precisely UTF-EBCDIC will go.

It seems that unpack("h*",...) is used more often than pack("h*",...). I did note that return qq'unpack("F", pack("h*", "$hex"))'; is used in Deparse.pm and IO-Compress uses pack("*h",...) in Perl 5.12

If you want further examples, here is a Google Code Search list. You can see pack|unpack("h*"...) is fairly rare and mostly to do with determining platform endianess...

I imagine this being useful when transfering data to or reading data from a machine with different endianess. If some process expects to receive data the way it would normally represent it in memory, then you better send your data just that way.

The distinction between the two just has to do with whether you are working with big-endian or little-endian data. Sometimes you have no control over the source or destination of your data, so the H and h flags to pack are there to give you the option. V and N are there for the same reason.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!