How to serialize/deserialize Pandas DataFrame to and from ProtoBuf/Gzip in a RESTful Flask App?

牧云@^-^@ 提交于 2021-01-21 01:45:11

问题


I have a pandas dataframe to be returned as a Flask Response object in a flask application. Currently I am converting it to a JSON Object,

df = df.to_json()
return Response(df, status=200, mimetype='application/json') 

The dataframe size is really huge of the magnitude, probably 5000000 X 10. On the client side when I deserialize it as,

df = response.read_json()

As my number of URL request parameters grow, the dataframe grows as well. Deserialization time grows at a linear factor as compared to serialization, which I would want to avoid. e.g: Serialization takes 15-20 seconds, deserialization takes 60-70 seconds.

Is there a way that protobuf can help in this case to convert pandas dataframe to a protobuf object. Also is there a way that I can send this JSON as Gunzipped mimetype through flask? I believe there's a comparable timing and efficiency between protobuf and gunzip.

What's the best solution in such a scenario?

Thanks in advance.


回答1:


I ran into the same problem recently. I solved it by iterating through the rows of my DataFrame and calling protobuf_obj.add() in that loop, using info from the DataFrame. You can then GZIP the serialized string output.

i.e. something along the lines of:

for _, row in df.iterrows():
    protobuf_obj.add(val1=row[col1], val2=row[col2])
proto_str = protobuf_obj.SerializeToString()
return gzip.compress(proto_str)

Given that this question hasn't been answered in 9 months, I'm not sure there's a better solution but definitely open to hearing one if there is!



来源:https://stackoverflow.com/questions/38388001/how-to-serialize-deserialize-pandas-dataframe-to-and-from-protobuf-gzip-in-a-res

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!