Is there a fast way to read alternate bytes in dd

问题

I'm trying to read out every other pair of bytes in a binary file using dd in a loop, but it is unusably slow.

I have a binary file on a BusyBox embedded device containing data in rgb565 format. Each pixel is 2 bytes and I'm trying to read out every other pixel to do very basic image scaling to reduce file size.

The overall size is 640x480 and I've been able to read every other "row" of pixels by looping dd with a 960 byte block size. But doing the same for every other "column" that remains by looping through with a 2 byte block size is ridiculously slow even on my local system.

i=1
while [[ $i -le 307200 ]]
do
        dd bs=2 skip=$((i-1)) seek=$((i-1)) count=1 if=./tmpfile >> ./outfile 2>/dev/null
        let i=i+2
done

While I get the output I expect, this method is unusable.

Is there some less obvious way to have dd quickly copy every other pair of bytes?

Sadly I don't have much control over what gets compiled in to BusyBox. I'm open to other possible methods but a dd/sh solution may be all I can use. For instance, one build has omitted head -c...

I appreciate all the feedback. I will check out each of the various suggestions and check back with results.

回答1:

Skipping every other character is trivial for tools like sed or awk as long as you don't need to cope with newlines and null bytes. But Busybox's support for null bytes in sed and awk is poor enough that I don't think you can cope with them at all. It's possible to deal with newlines, but it's a giant pain because there are 16 different combinations to deal with depending on whether each position in a 4-byte block is a newline or not.

Since arbitrary binary data is a pain, let's translate to hexadecimal or octal! I'll draw some inspiration from bin2hex and hex2bin scripts by Stéphane Chazelas. Since we don't care about the intermediate format, I'll use octal, which is a lot simpler to deal with because the final step uses printf which only supports octal. Stéphane's hex2bin uses awk for the hexadecimal-to-octal conversion; a oct2bin can use sed. So in the end you need sh, od, sed and printf. I don't think you can avoid printf: it's critical to outputting null bytes. While od is essential, most of its options aren't, so it should be possible to tweak this code to support a very stripped-down od with a bit more postprocessing.

od -An -v -t o1 -w4 |
sed 's/^ \([0-7]*\) \([0-7]*\).*/printf \\\\\1\\\\\2/' |
sh

The reason this is so fast compared to your dd-based approach is that BusyBox runs printf in the parent process, whereas dd requires its own process. Forking is slow. If I remember correctly, there's a compilation option which makes BusyBox fork for all utilities. In this case my approach will probably be as slow as yours. Here's an intermediate approach using dd which can't avoid the forks, but at least avoids opening and closing the file every time. It should be a little faster than yours.

i=$(($(wc -c <"$1") / 4))
exec <"$1"
dd ibs=2 count=1 conv=notrunc 2>/dev/null
while [ $i -gt 1 ]; do
  dd ibs=2 count=1 skip=1 conv=notrunc 2>/dev/null
  i=$((i - 1))
done

回答2:

No idea if this will be faster or even possible with BusyBox, but it's a thought...

#!/bin/bash

# Empty result file
> result

exec 3< datafile
while true; do
    # Read 2 bytes into file "short"
    dd bs=2 count=1 <&3 > short 2> /dev/null
    [ ! -s short ] && break
    # Accumulate result file
    cat short >> result
    # Read two bytes and discard
    dd bs=2 count=1 <&3 > short 2> /dev/null
    [ ! -s short ] && break
done

Or this should be more efficient:

#!/bin/bash

exec 3< datafile
for ((i=0;i<76800;i++)) ; do
    # Skip 2 bytes then read 2 bytes
    dd bs=2 count=1 skip=1 <&3 2> /dev/null
done > result

Or, maybe you could use netcat or ssh to send the file to a sensible (more powerful) computer with proper tools to process it and return it. For example, if the remote computer had ImageMagick it could down-scale the image very simply.

回答3:

Another option might be to use Lua which has a reputation for being small, fast and well suited to embedded systems - see Lua website. There are pre-built, downloadable binaries of it there too. It is also suggested on the Busybox website here.

I have never written any Lua before, so there may be some inefficiencies but this seems to work pretty well and processes a 640x480 RGB565 image in a few milliseconds on my desktop.

-- scale.lua
-- Usage: lua scale.lua input.bin output.bin
-- Scale an image by skipping alternate lines and alternate columns

-- Set up width, height and bytes per pixel
w   = 640
h   = 480
bpp = 2    

-- Open first argument for input, second for output
inp = assert(io.open(arg[1], "rb"))
out = assert(io.open(arg[2], "wb"))

-- Read image, one line at a time
for i = 0, h-1, 1 do
   -- Read a whole line
   line = inp:read(w*bpp)

   -- Only use every second line
   if (i % 2) == 0 then
      io.write("DEBUG: Processing row: ",i,"\n")
      -- Build up new, reduced line by picking substrings
      reduced=""
      for p = 1, w*bpp, bpp*2 do
         reduced = reduced .. string.sub(line,p,p+bpp-1)
      end
      io.write("DEBUG: New line length in bytes: ",#reduced,"\n")
      out:write(reduced)
   end
end

assert(out:close())

I created a greyscale test image with ImageMagick as follows:

magick -depth 16 -size 640x480 gradient: gray:image.bin

Then I ran the above Lua script with:

lua scale.lua image.bin smaller.bin

Then I made a JPEG I could view for testing with:

magick -depth 16 -size 320x240 gray:smaller.bin smaller.jpg

来源：https://stackoverflow.com/questions/55836859/is-there-a-fast-way-to-read-alternate-bytes-in-dd

标签

shell

BusyBox