Getting a tuple in a Dafaframe into multiple rows

亡梦爱人 提交于 2019-12-11 03:37:00

问题


I have a Dataframe, which has two columns (Customer, Transactions). The Transactions column is a tuple of all the transaction id's of that customer.

Customer Transactions
1        (a,b,c)
2        (d,e)

I want to convert this into a dataframe, which has customer and transaction id's, like this.

Customer  Transactions
1         a
1         b
1         c
2         d
2         e

We can do it using loops, but is there a straight 1 or 2 lines way for doing that.


回答1:


You can use DataFrame constructor:

df = pd.DataFrame({'Customer':[1,2],
                   'Transactions':[('a','b','c'),('d','e')]})

print (df)
   Customer Transactions
0         1    (a, b, c)
1         2       (d, e)

df1 = pd.DataFrame(df.Transactions.values.tolist(), index=df.Customer)
print (df1)
          0  1     2
Customer            
1         a  b     c
2         d  e  None

Then reshape with stack:

print (df1.stack().reset_index(drop=True, level=1).reset_index(name='Transactions'))
   Customer Transactions
0         1            a
1         1            b
2         1            c
3         2            d
4         2            e



回答2:


I think following is faster:

import numpy as np
import random
import string
import pandas as pd
from itertools import chain

customer = np.unique(np.random.randint(0, 1000000, 100000))
transactions = [tuple(string.ascii_letters[:random.randint(3, 10)]) for _ in range(len(customer))]
df = pd.DataFrame({"customer":customer, "transactions":transactions})

df2 = pd.DataFrame({
        "customer": np.repeat(df.customer.values, df.transactions.str.len()),
        "transactions": list(chain.from_iterable(df.transactions))})


来源:https://stackoverflow.com/questions/39790830/getting-a-tuple-in-a-dafaframe-into-multiple-rows

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!