python example for reading multiple protobuf messages from a stream

前端 未结 3 2056
情歌与酒
情歌与酒 2021-02-05 09:14

I\'m working with data from spinn3r, which consists of multiple different protobuf messages serialized into a byte stream:

http://code.google.com/p/spinn3r-client/wiki/P

相关标签:
3条回答
  • 2021-02-05 09:31

    This is simple enough that I can see why maybe nobody has bothered to make a reusable tool:

    '''
    Parses multiple protobuf messages from a stream of spinn3r data
    '''
    
    import sys
    sys.path.append('python_proto/src')
    import spinn3rApi_pb2
    import protoStream_pb2
    
    data = open('8mny44bs6tYqfnofg0ELPg.protostream').read()
    
    def _VarintDecoder(mask):
        '''Like _VarintDecoder() but decodes signed values.'''
    
        local_ord = ord
        def DecodeVarint(buffer, pos):
            result = 0
            shift = 0
            while 1:
                b = local_ord(buffer[pos])
                result |= ((b & 0x7f) << shift)
                pos += 1
                if not (b & 0x80):
                    if result > 0x7fffffffffffffff:
                        result -= (1 << 64)
                        result |= ~mask
                    else:
                        result &= mask
                        return (result, pos)
                shift += 7
                if shift >= 64:
                    ## need to create (and also catch) this exception class...
                    raise _DecodeError('Too many bytes when decoding varint.')
        return DecodeVarint
    
    ## get a 64bit varint decoder
    decoder = _VarintDecoder((1<<64) - 1)
    
    ## get the three types of protobuf messages we expect to see
    header    = protoStream_pb2.ProtoStreamHeader()
    delimiter = protoStream_pb2.ProtoStreamDelimiter()
    entry     = spinn3rApi_pb2.Entry()
    
    ## get the header
    pos = 0
    next_pos, pos = decoder(data, pos)
    header.ParseFromString(data[pos:pos + next_pos])
    ## should check its contents
    
    while 1:
        pos += next_pos
        next_pos, pos = decoder(data, pos)
        delimiter.ParseFromString(data[pos:pos + next_pos])
    
        if delimiter.delimiter_type == delimiter.END:
            break
    
        pos += next_pos
        next_pos, pos = decoder(data, pos)
        entry.ParseFromString(data[pos:pos + next_pos])
        print entry
    
    0 讨论(0)
  • 2021-02-05 09:49

    It looks like the code in the other answer is potentially lifted from here. Check the licence before using this file but I managed to get it to read varint32s using code such as this:

    import sys
    import myprotocol_pb2 as proto
    import varint # (this is the varint.py file)
    
    data = open("filename.bin", "rb").read() # read file as string
    decoder = varint.decodeVarint32          # get a varint32 decoder
                                             # others are available in varint.py
    
    next_pos, pos = 0, 0
    while pos < len(data):
        msg = proto.Msg()                    # your message type
        next_pos, pos = decoder(data, pos)
        msg.ParseFromString(data[pos:pos + next_pos])
    
        # use parsed message
    
        pos += next_pos
    print "done!"
    

    This is very simple code designed to load messages of a single type delimited by varint32s which describe the next message's size.


    Update: It may also be possible to include this file directly from the protobuf library by using:

    from google.protobuf.internal.decoder import _DecodeVarint32
    
    0 讨论(0)
  • 2021-02-05 09:49

    I've implemented a small python package to serialize multiple protobuf messages into a stream and deserialize them from a stream. You can install it by pip:

    pip install pystream-protobuf
    

    Here's a sample code writing two lists of protobuf messages in to a file:

    import stream
    
    with stream.open("test.gam", "wb") as ostream:
        ostream.write(*objects_list)
        ostream.write(*another_objects_list)
    

    and then reading the same messages (e.g. Alignment messages defined in vg_pb2.py) from the stream:

    import stream
    import vg_pb2
    
    alns_list = []
    with stream.open("test.gam", "rb") as istream:
        for data in istream:
            aln = vg_pb2.Alignment()
            aln.ParseFromString(data)
            alns_list.append(aln)
    
    0 讨论(0)
提交回复
热议问题