问题
I'm trying to read out every other pair of bytes in a binary file using dd in a loop, but it is unusably slow.
I have a binary file on a BusyBox embedded device containing data in rgb565 format. Each pixel is 2 bytes and I'm trying to read out every other pixel to do very basic image scaling to reduce file size.
The overall size is 640x480 and I've been able to read every other "row" of pixels by looping dd with a 960 byte block size. But doing the same for every other "column" that remains by looping through with a 2 byte block size is ridiculously slow even on my local system.
i=1
while [[ $i -le 307200 ]]
do
dd bs=2 skip=$((i-1)) seek=$((i-1)) count=1 if=./tmpfile >> ./outfile 2>/dev/null
let i=i+2
done
While I get the output I expect, this method is unusable.
Is there some less obvious way to have dd quickly copy every other pair of bytes?
Sadly I don't have much control over what gets compiled in to BusyBox. I'm open to other possible methods but a dd/sh solution may be all I can use. For instance, one build has omitted head -c...
I appreciate all the feedback. I will check out each of the various suggestions and check back with results.
回答1:
Skipping every other character is trivial for tools like sed or awk as long as you don't need to cope with newlines and null bytes. But Busybox's support for null bytes in sed and awk is poor enough that I don't think you can cope with them at all. It's possible to deal with newlines, but it's a giant pain because there are 16 different combinations to deal with depending on whether each position in a 4-byte block is a newline or not.
Since arbitrary binary data is a pain, let's translate to hexadecimal or octal! I'll draw some inspiration from bin2hex and hex2bin scripts by Stéphane Chazelas. Since we don't care about the intermediate format, I'll use octal, which is a lot simpler to deal with because the final step uses printf
which only supports octal. Stéphane's hex2bin
uses awk for the hexadecimal-to-octal conversion; a oct2bin
can use sed. So in the end you need sh
, od
, sed
and printf
.
I don't think you can avoid printf
: it's critical to outputting null bytes. While od
is essential, most of its options aren't, so it should be possible to tweak this code to support a very stripped-down od with a bit more postprocessing.
od -An -v -t o1 -w4 |
sed 's/^ \([0-7]*\) \([0-7]*\).*/printf \\\\\1\\\\\2/' |
sh
The reason this is so fast compared to your dd-based approach is that BusyBox runs printf
in the parent process, whereas dd
requires its own process. Forking is slow. If I remember correctly, there's a compilation option which makes BusyBox fork for all utilities. In this case my approach will probably be as slow as yours. Here's an intermediate approach using dd
which can't avoid the forks, but at least avoids opening and closing the file every time. It should be a little faster than yours.
i=$(($(wc -c <"$1") / 4))
exec <"$1"
dd ibs=2 count=1 conv=notrunc 2>/dev/null
while [ $i -gt 1 ]; do
dd ibs=2 count=1 skip=1 conv=notrunc 2>/dev/null
i=$((i - 1))
done
回答2:
No idea if this will be faster or even possible with BusyBox, but it's a thought...
#!/bin/bash
# Empty result file
> result
exec 3< datafile
while true; do
# Read 2 bytes into file "short"
dd bs=2 count=1 <&3 > short 2> /dev/null
[ ! -s short ] && break
# Accumulate result file
cat short >> result
# Read two bytes and discard
dd bs=2 count=1 <&3 > short 2> /dev/null
[ ! -s short ] && break
done
Or this should be more efficient:
#!/bin/bash
exec 3< datafile
for ((i=0;i<76800;i++)) ; do
# Skip 2 bytes then read 2 bytes
dd bs=2 count=1 skip=1 <&3 2> /dev/null
done > result
Or, maybe you could use netcat
or ssh
to send the file to a sensible (more powerful) computer with proper tools to process it and return it. For example, if the remote computer had ImageMagick it could down-scale the image very simply.
回答3:
Another option might be to use Lua which has a reputation for being small, fast and well suited to embedded systems - see Lua website. There are pre-built, downloadable binaries of it there too. It is also suggested on the Busybox website here.
I have never written any Lua before, so there may be some inefficiencies but this seems to work pretty well and processes a 640x480 RGB565 image in a few milliseconds on my desktop.
-- scale.lua
-- Usage: lua scale.lua input.bin output.bin
-- Scale an image by skipping alternate lines and alternate columns
-- Set up width, height and bytes per pixel
w = 640
h = 480
bpp = 2
-- Open first argument for input, second for output
inp = assert(io.open(arg[1], "rb"))
out = assert(io.open(arg[2], "wb"))
-- Read image, one line at a time
for i = 0, h-1, 1 do
-- Read a whole line
line = inp:read(w*bpp)
-- Only use every second line
if (i % 2) == 0 then
io.write("DEBUG: Processing row: ",i,"\n")
-- Build up new, reduced line by picking substrings
reduced=""
for p = 1, w*bpp, bpp*2 do
reduced = reduced .. string.sub(line,p,p+bpp-1)
end
io.write("DEBUG: New line length in bytes: ",#reduced,"\n")
out:write(reduced)
end
end
assert(out:close())
I created a greyscale test image with ImageMagick as follows:
magick -depth 16 -size 640x480 gradient: gray:image.bin
Then I ran the above Lua script with:
lua scale.lua image.bin smaller.bin
Then I made a JPEG I could view for testing with:
magick -depth 16 -size 320x240 gray:smaller.bin smaller.jpg
来源:https://stackoverflow.com/questions/55836859/is-there-a-fast-way-to-read-alternate-bytes-in-dd