Speed up python's struct.unpack

后端 未结 4 854
悲&欢浪女
悲&欢浪女 2021-02-14 07:23

I am trying to speed up my script. It basically reads a pcap file with Velodyne\'s Lidar HDL-32 information and allows me to get X, Y, Z, and Intensity values. I have profiled m

相关标签:
4条回答
  • 2021-02-14 07:55

    Compile a Struct ahead of time, to avoid the Python level wrapping code using the module level methods. Do it outside the loops, so the construction cost is not paid repeatedly.

    unpack_ushort = struct.Struct('<H').unpack
    unpack_ushort_byte = struct.Struct('<HB').unpack
    

    The Struct methods themselves are implemented in C in CPython (and the module level methods are eventually delegating to the same work after parsing the format string), so building the Struct once and storing bound methods saves a non-trivial amount of work, particularly when unpacking a small number of values.

    You can also save some work by unpacking multiple values together, rather than one at a time:

    distanceInformation, intensity = unpack_ushort_byte(firingData[startingByte:startingByte + 3])
    distanceInformation *= 0.002
    

    As Dan notes, you could further improve this with iter_unpack, which would further reduce the amount of byte code execution and small slice operations.

    0 讨论(0)
  • 2021-02-14 07:59

    For your specific situation if you can fit your loop into a numpy call, that'd be fastest.

    With that said, for just the struct.unpack part -- if your data happens to native byte order, you can use memoryview.cast. For a short example, it is about 3x faster than naive struct.unpack without any change in logic.

    In [20]: st = struct.Struct("<H")
    
    In [21]: %timeit struct.unpack("<H", buf[20:22])
    1.45 µs ± 26.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
    
    In [22]: %timeit st.unpack(buf[20:22])
    778 ns ± 10.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
    
    In [23]: %timeit buf.cast("H")[0]
    447 ns ± 4.16 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
    
    0 讨论(0)
  • 2021-02-14 08:17

    You can unpack the raw distanceInformation and intensity values together in one call. Especially because you're just putting them into a list together: that's what unpack() does when it unpacks multiple values. In your case, you need to then multiple the distanceInformation by 0.002, but you might save time by leaving this until later, because you can use iter_unpack() to parse the whole list of raw pairs in one call. That function gives you a generator, which can be sliced with itertools.islice() and then turned into a list. Something like this:

    laser_iter = struct.iter_unpack('<HB', firingData[firingDataStartingByte + 4])
    laser = [[d * 0.002, i] for d, i in itertools.islice(laser_iter, lasers)]
    

    Unfortunately this is a little harder to read, so you might want to find a way to spread this out into more lines of code, with more descriptive variable names, or add a comment for the future when you forget why you wrote this…

    0 讨论(0)
  • 2021-02-14 08:21

    Numpy lets you do this very quickly. In this case I think the easiest way is to use the ndarray constructor directly:

    import numpy as np
    
    def with_numpy(buffer):
        # Construct ndarray with: shape, dtype, buffer, offset, strides.
        rotational = np.ndarray((firingBlocks,), '<H', buffer, 42+2, (100,))
        distance = np.ndarray((firingBlocks,lasers), '<H', buffer, 42+4, (100,3))
        intensity = np.ndarray((firingBlocks,lasers), '<B', buffer, 42+6, (100,3))
        return rotational, distance*0.002, intensity
    

    This returns separate arrays instead of the nested list, which should be much easier to process further. As input it takes a buffer object (in Python 2) or anything that exposes the buffer interface. Unfortunately, it depends on your Python version (2/3) what objects you can use exactly. But this method is very fast:

    import numpy as np
    
    firingBlocks = 10**4
    lasers = 32
    packet_raw = np.random.bytes(42 + firingBlocks*100)
    
    %timeit readDataPacket(memoryview(packet_raw))
    # 1 loop, best of 3: 807 ms per loop
    %timeit with_numpy(packet_raw)
    # 100 loops, best of 3: 10.8 ms per loop
    
    0 讨论(0)
提交回复
热议问题