问题
I am trying to visualize data of this form:
timestamp senderId
0 735217 106758968942084595234
1 735217 114647222927547413607
2 735217 106758968942084595234
3 735217 106758968942084595234
4 735217 114647222927547413607
5 etc...
geom_density
works if I don't separate the senderId
s:
df = pd.read_pickle('data.pkl')
df.columns = ['timestamp', 'senderId']
plot = ggplot(aes(x='timestamp'), data=df) + geom_density()
print plot
The result looks as expected:
However if I want to show the senderId
s separately, as is done in the doc, it fails:
> plot = ggplot(aes(x='timestamp', color='senderId'), data=df) + geom_density()
ValueError: `dataset` input should have multiple elements.
Trying out with a larger dataset (~40K events):
> plot = ggplot(aes(x='timestamp', color='senderId'), data=df) + geom_density()
numpy.linalg.linalg.LinAlgError: singular matrix
Any idea? There are some answers on SO for those errors but none seems relevant.
This is the kind of graph I want (from ggplot's doc):
回答1:
With the smaller dataset:
> plot = ggplot(aes(x='timestamp', color='senderId'), data=df) + geom_density()
ValueError: `dataset` input should have multiple elements.
This was because some senderId
s had only one row.
With the bigger dataset:
> plot = ggplot(aes(x='timestamp', color='senderId'), data=df) + geom_density()
numpy.linalg.linalg.LinAlgError: singular matrix
This was because for some senderId
s I had multiple rows at the exact same timestamp
. This is not supported by ggplot
. I could solve it by using finer timestamps.
来源:https://stackoverflow.com/questions/40101519/plotting-event-density-in-python-with-ggplot-and-pandas