可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
Suppose I have data of the form
Name h1 h2 h3 h4 A 1 nan 2 3 B nan nan 1 3 C 1 3 2 nan
I want to move all non-nan cells to the left (or collect all non-nan data in new columns) while preserving the order from left to right, getting
Name h1 h2 h3 h4 A 1 2 3 nan B 1 3 nan nan C 1 3 2 nan
I can of course do so row by row. But I hope to know if there are other ways with better performance.
回答1:
Here's what I did:
I unstacked your dataframe into a longer format, then grouped by the name column. Within each group, I drop the NaNs, but then reindex to the full h1 thought h4 set, thus re-creating your NaNs to the right.
from io import StringIO import pandas def defragment(x): values = x.dropna().values return pandas.Series(values, index=df.columns[:len(values)]) datastring = StringIO("""\ Name h1 h2 h3 h4 A 1 nan 2 3 B nan nan 1 3 C 1 3 2 nan""") df = pandas.read_table(datastring, sep='\s+').set_index('Name') long_index = pandas.MultiIndex.from_product([df.index, df.columns]) print( df.stack() .groupby(level='Name') .apply(defragment) .reindex(long_index) .unstack() )
And so I get:
h1 h2 h3 h4 A 1 2 3 NaN B 1 3 NaN NaN C 1 3 2 NaN
回答2:
Here's how you could do it with a regex (possibly not recommended):
pd.read_csv(StringIO(re.sub(',+',',',df.to_csv()))) Out[20]: Name h1 h2 h3 h4 0 A 1 2 3 NaN 1 B 1 3 NaN NaN 2 C 1 3 2 NaN
回答3:
First, make function.
def squeeze_nan(x): original_columns = x.index.tolist() squeezed = x.dropna() squeezed.index = [original_columns[n] for n in range(squeezed.count())] return squeezed.reindex(original_columns, fill_value=np.nan)
Second, apply the function.
df.apply(squeeze_nan, axis=1)
You can also try axis=0 and .[::-1] to squeeze nan to any direction.
[EDIT]
@Mxracer888 you want this?
def squeeze_nan(x, hold): if x.name not in hold: original_columns = x.index.tolist() squeezed = x.dropna() squeezed.index = [original_columns[n] for n in range(squeezed.count())] return squeezed.reindex(original_columns, fill_value=np.nan) else: return x df.apply(lambda x: squeeze_nan(x, ['B']), axis=1)