Is there an operation in pandas that does the same as flatMap in pyspark?
flatMap example:
>>> rdd = sc.parallelize([2, 3, 4]) >>> sort
there are three steps to solve this question.
import pandas as pd df = pd.DataFrame({'x': [[1, 2], [3, 4, 5]]}) df_new = df['x'].apply(pd.Series).unstack().reset_index().dropna() df_new[['level_1',0]]`