问题
I am trying to concatenate all columns of a pandas dataframe so that I end up with 1 column that contains all the values from the dataframe. The following code does this:
df2 = pd.concat([df[0], df[1], df[2], df[3], df[4], df[5], df[6], df[7]])
But I would like to be able to do this with dataframes that have different numbers of columns. When I tried:
dfpr2 = pd.concat([df.columns)
I get the following error:
"cannot concatenate object of type <class 'pandas.core.indexes.range.RangeIndex>
; only Series and DataFrame objs are valid
is there a way to get around this? I tried setting ignore_index = True, but that did not seem to help either. Thanks!!
回答1:
IIUC df.astype(str).sum(axis=1)
df = pd.DataFrame({'A' : ['A','B','C'],
'B' : [0,1,2],
'C' : ['2019-01-10','2020-01-10','2021-01-10']})
df['hash'] = df.astype(str).sum(axis=1)
print(df)
A B C hash
0 A 0 2019-01-10 A02019-01-10
1 B 1 2020-01-10 B12020-01-10
2 C 2 2021-01-10 C22021-01-10
If you need a custom delimiter then use .agg
df.astype(str).agg('|'.join,axis=1)
0 A|0|2019-01-10
1 B|1|2020-01-10
2 C|2|2021-01-10
回答2:
This is a simple way of concatenating column values
df1 = df['1st Column Name'] + df['2nd Column Name'] + ...
回答3:
Timing for different methods :
%timeit df.iloc[:,0].str.cat(df.iloc[:,1:].astype(str),',')
880 µs ± 28.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit df.astype(str).agg('|'.join,axis=1)
1.45 ms ± 39 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit df.astype(str).sum(axis=1)
562 µs ± 11.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit [','.join(ent) for ent in df.astype(str).to_numpy()]
350 µs ± 6.48 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
I think @cs95 has a stackoverflow post that talked about strings. for strings, they are much faster when the computation is done within Python.
来源:https://stackoverflow.com/questions/61780382/concatenating-all-columns-in-pandas-dataframe