I have the following data (2 columns, 4 rows):
Column 1: A, B, C, D
Column 2: E, F, G, H
I am attempting to combine the columns into one c
The trick is to use stack()
df.stack().reset_index()
level_0 level_1 0
0 0 Column 1 A
1 0 Column 2 E
2 1 Column 1 B
3 1 Column 2 F
4 2 Column 1 C
5 2 Column 2 G
6 3 Column 1 D
7 3 Column 2 H
You can flatten the values in column direction using ravel
, is much faster.
In [1238]: df
Out[1238]:
Column 1 Column 2
0 A E
1 B F
2 C G
3 D H
In [1239]: pd.Series(df.values.ravel('F'))
Out[1239]:
0 A
1 B
2 C
3 D
4 E
5 F
6 G
7 H
dtype: object
Details
Medium
In [1245]: df.shape
Out[1245]: (4000, 2)
In [1246]: %timeit pd.Series(df.values.ravel('F'))
10000 loops, best of 3: 86.2 µs per loop
In [1247]: %timeit df['Column 1'].append(df['Column 2']).reset_index(drop=True)
1000 loops, best of 3: 816 µs per loop
Large
In [1249]: df.shape
Out[1249]: (40000, 2)
In [1250]: %timeit pd.Series(df.values.ravel('F'))
10000 loops, best of 3: 87.5 µs per loop
In [1251]: %timeit df['Column 1'].append(df['Column 2']).reset_index(drop=True)
100 loops, best of 3: 1.72 ms per loop
Update
pandas has a built in method for this stack which does what you want see the other answer.
This was my first answer before I knew about stack
many years ago:
In [227]:
df = pd.DataFrame({'Column 1':['A', 'B', 'C', 'D'],'Column 2':['E', 'F', 'G', 'H']})
df
Out[227]:
Column 1 Column 2
0 A E
1 B F
2 C G
3 D H
[4 rows x 2 columns]
In [228]:
df['Column 1'].append(df['Column 2']).reset_index(drop=True)
Out[228]:
0 A
1 B
2 C
3 D
4 E
5 F
6 G
7 H
dtype: object
What you appear to be asking is simply for help on creating another view of your data. If there is no reason those data are in two columns in the first place then just create one column. If however you need to combine them for presentation in some other tool you can do something like:
import itertools as it, pandas as pd
df = pd.DataFrame({1:['a','b','c','d'],2:['e','f','g','h']})
sorted(it.chain(*df.values))
# -> ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']