问题
I have a pandas dataframe
to be returned as a Flask Response
object in a flask
application. Currently I am converting it to a JSON Object
,
df = df.to_json()
return Response(df, status=200, mimetype='application/json')
The dataframe size is really huge of the magnitude, probably 5000000 X 10. On the client side when I deserialize it as,
df = response.read_json()
As my number of URL request parameters
grow, the dataframe
grows as well. Deserialization time grows at a linear factor
as compared to serialization, which I would want to avoid. e.g: Serialization
takes 15-20 seconds, deserialization
takes 60-70 seconds.
Is there a way that protobuf
can help in this case to convert pandas dataframe to a protobuf object. Also is there a way that I can send this JSON
as Gunzipped
mimetype through flask? I believe there's a comparable timing and efficiency between protobuf
and gunzip
.
What's the best solution in such a scenario?
Thanks in advance.
回答1:
I ran into the same problem recently. I solved it by iterating through the rows of my DataFrame and calling protobuf_obj.add() in that loop, using info from the DataFrame. You can then GZIP the serialized string output.
i.e. something along the lines of:
for _, row in df.iterrows():
protobuf_obj.add(val1=row[col1], val2=row[col2])
proto_str = protobuf_obj.SerializeToString()
return gzip.compress(proto_str)
Given that this question hasn't been answered in 9 months, I'm not sure there's a better solution but definitely open to hearing one if there is!
来源:https://stackoverflow.com/questions/38388001/how-to-serialize-deserialize-pandas-dataframe-to-and-from-protobuf-gzip-in-a-res