Categorical column after melt in pandas

◇◆丶佛笑我妖孽 提交于 2021-01-25 07:21:25

问题


Is it possible to end up with a categorical variable column after a melt operation in pandas?

If I set up the data like this:

import pandas as pd
import numpy as np

df = pd.DataFrame(
    np.random.randn(3, 5), 
    columns=["A", "B", "C", "D", "E"]
)
df["id"] = range(1, 4)
df
|    |         A |         B |         C |         D |          E |   id |
|----|-----------|-----------|-----------|-----------|------------|------|
|  0 | -0.406174 | -0.686917 | -0.172913 | -0.273074 | -0.0246714 |    1 |
|  1 |  0.323783 | -1.7731   |  1.57581  | -1.15671  | -1.23926   |    2 |
|  2 | -1.1426   | -0.591279 |  1.15265  |  0.326712 | -0.86374   |    3 |

and then apply

melted_df = df.melt(id_vars="id", value_vars=["A", "B", "C", "D", "E"])
melted_df
|    |   id | variable   |      value |
|----|------|------------|------------|
|  0 |    1 | A          | -0.406174  |
|  1 |    2 | A          |  0.323783  |
|  2 |    3 | A          | -1.1426    |
|  3 |    1 | B          | -0.686917  |
|  4 |    2 | B          | -1.7731    |
|  5 |    3 | B          | -0.591279  |
|  6 |    1 | C          | -0.172913  |
|  7 |    2 | C          |  1.57581   |
|  8 |    3 | C          |  1.15265   |
|  9 |    1 | D          | -0.273074  |
| 10 |    2 | D          | -1.15671   |
| 11 |    3 | D          |  0.326712  |
| 12 |    1 | E          | -0.0246714 |
| 13 |    2 | E          | -1.23926   |
| 14 |    3 | E          | -0.86374   |

The dtype of the variable column is object

melted_df.dtypes
id            int64
variable     object
value       float64
dtype: object

I'd like this to be category. I know, I can convert it easily by:

melted_df["variable"].astype("category")

But for large datasets, I'd like to avoid this overhead. In the documentation I didn't find such an option, but since the resulting column contains categorical data by definition, I presume there must be a possiblity.


回答1:


I don't think it's possible with melt, because when it creates that column it infers the dtype and 'category' is not a dtype that pandas currently infers. (Here's a related issue where it doesn't correctly infer Int32 dtypes Why is pandas.melt messing with my dtypes?).

stack will keep the categorical dtype if you first convert the columns. stack will result in a slightly different ordering than melt, but the data will be the same. stack is also a bit clunkier with naming the resulting columns.

df = df.set_index('id')
df.columns = df.columns.astype('category')

res = (df.stack()
         .rename_axis(['id', 'variable'])
         .rename('value')
         .reset_index())
#    id variable     value
#0    1        A  0.424781
#1    1        B -0.317107
#2    1        C  0.731121
#3    1        D  0.042642
#4    1        E  0.648352
#...
#13   3        D -0.889600
#14   3        E -1.822898

res.dtypes
#id             int64
#variable    category
#value        float64
#dtype: object


来源:https://stackoverflow.com/questions/64900604/categorical-column-after-melt-in-pandas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!