I have a dataframe with results as below. Sample dataframe shown actual one is much larger. I want to get a dictionary (or another structure if it will be faster) with the
You can do boolean indexing on the dataframe columns in a dictionary comprehension.
>>> {idx: df.columns[row].tolist() for idx, row in df.notnull().iterrows()}
{1: ['MSFT'], 2: ['GOOG', 'AMZN'], 3: ['AAPL', 'AMZN', 'FB'], 4: ['FB']}
You can get the dot product of mask and columns and then use string operations i.e
df.notna().dot(df.columns+',').str.strip(',').str.split(',').to_dict()
{1: ['MSFT'], 2: ['GOOG', 'AMZN'], 3: ['AAPL', 'AMZN', 'FB'], 4: ['FB']}
df.stack().reset_index(level=1).groupby(level=0).level_1.apply(list).to_dict()
Out[764]: {1: ['MSFT'], 2: ['GOOG', 'AMZN'], 3: ['AAPL', 'AMZN', 'FB'], 4: ['FB']}
You can use .apply
df.apply(lambda x: list(x.dropna().index), axis=1).to_dict() #Updated answer
# Or dict(df.apply(lambda x: list(x.index[~x.isnull()]), axis=1)) #Original answer
Output:
{1: ['MSFT'], 2: ['GOOG', 'AMZN'], 3: ['AAPL', 'AMZN', 'FB'], 4: ['FB']}
Maybe not the best in terms of performance, but you could use iterrows:
import numpy as np
results = {}
for i, row in df.iterrows():
results[i] = list(df.columns[~np.isnan(row)])