Speed up python's struct.unpack

后端未结

关注

 4  863

I am trying to speed up my script. It basically reads a pcap file with Velodyne\'s Lidar HDL-32 information and allows me to get X, Y, Z, and Intensity values. I have profiled m

相关标签:

4条回答

情话喂你

2021-02-14 07:55
Compile a Struct ahead of time, to avoid the Python level wrapping code using the module level methods. Do it outside the loops, so the construction cost is not paid repeatedly.
```
unpack_ushort = struct.Struct('<H').unpack
unpack_ushort_byte = struct.Struct('<HB').unpack
```
The Struct methods themselves are implemented in C in CPython (and the module level methods are eventually delegating to the same work after parsing the format string), so building the Struct once and storing bound methods saves a non-trivial amount of work, particularly when unpacking a small number of values.

You can also save some work by unpacking multiple values together, rather than one at a time:
```
distanceInformation, intensity = unpack_ushort_byte(firingData[startingByte:startingByte + 3])
distanceInformation *= 0.002
```
As Dan notes, you could further improve this with iter_unpack, which would further reduce the amount of byte code execution and small slice operations.
0 讨论(0)
发布评论:

提交评论
- 加载中...

闹比i

2021-02-14 07:59

For your specific situation if you can fit your loop into a numpy call, that'd be fastest.

With that said, for just the struct.unpack part -- if your data happens to native byte order, you can use memoryview.cast. For a short example, it is about 3x faster than naive struct.unpack without any change in logic.

In [20]: st = struct.Struct("<H")

In [21]: %timeit struct.unpack("<H", buf[20:22])
1.45 µs ± 26.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [22]: %timeit st.unpack(buf[20:22])
778 ns ± 10.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [23]: %timeit buf.cast("H")[0]
447 ns ± 4.16 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

0 讨论(0)

悲&欢浪女

2021-02-14 08:17
You can unpack the raw distanceInformation and intensity values together in one call. Especially because you're just putting them into a list together: that's what unpack() does when it unpacks multiple values. In your case, you need to then multiple the distanceInformation by 0.002, but you might save time by leaving this until later, because you can use iter_unpack() to parse the whole list of raw pairs in one call. That function gives you a generator, which can be sliced with itertools.islice() and then turned into a list. Something like this:
```
laser_iter = struct.iter_unpack('<HB', firingData[firingDataStartingByte + 4])
laser = [[d * 0.002, i] for d, i in itertools.islice(laser_iter, lasers)]
```
Unfortunately this is a little harder to read, so you might want to find a way to spread this out into more lines of code, with more descriptive variable names, or add a comment for the future when you forget why you wrote this…
0 讨论(0)
发布评论:

提交评论
- 加载中...

既然无缘

2021-02-14 08:21

Numpy lets you do this very quickly. In this case I think the easiest way is to use the ndarray constructor directly:

import numpy as np

def with_numpy(buffer):
    # Construct ndarray with: shape, dtype, buffer, offset, strides.
    rotational = np.ndarray((firingBlocks,), '<H', buffer, 42+2, (100,))
    distance = np.ndarray((firingBlocks,lasers), '<H', buffer, 42+4, (100,3))
    intensity = np.ndarray((firingBlocks,lasers), '<B', buffer, 42+6, (100,3))
    return rotational, distance*0.002, intensity

This returns separate arrays instead of the nested list, which should be much easier to process further. As input it takes a buffer object (in Python 2) or anything that exposes the buffer interface. Unfortunately, it depends on your Python version (2/3) what objects you can use exactly. But this method is very fast:

import numpy as np

firingBlocks = 10**4
lasers = 32
packet_raw = np.random.bytes(42 + firingBlocks*100)

%timeit readDataPacket(memoryview(packet_raw))
# 1 loop, best of 3: 807 ms per loop
%timeit with_numpy(packet_raw)
# 100 loops, best of 3: 10.8 ms per loop

0 讨论(0)