问题
Pretty much the title. I am attaching the spreadsheet here. I need to convert "Input" sheet to "Output" sheet. I know about Pandas wide_to_long. But I haven't been able to use it to give the desired output, the rows get scrambled up in the output.
import pandas as pd
df=pd.read_excel('../../Downloads/test.xlsx',sheet_name='Input', header=0)
newdf=pd.wide_to_long(df, [str(i) for i in range(2022,2028)], 'Hotel Name', 'value', sep='', suffix='.+')\
.reset_index()\
.sort_values('Hotel Name')\
.drop('value', axis=1)
newdf
The output is
回答1:
I would hide the hotel name in index, then change the columns to a MultiIndex, and stack:
df = pd.read_csv('test.csv', sep=';').set_index('Hotel Name')
df.columns = pd.MultiIndex.from_tuples([name.split(None, 1) for name in df.columns])
resul = df.stack()
it directly gives:
2022 2023 2024 2025 2026 2027 2028
Hotel Name
Hotel A Cost 0 0 0 0 0 0 0
Cum. Profit 0 35478 94608 189216 307476 449388 626778
Profit 0 35478 59130 94608 118260 141912 177390
Revenue 0 35478 59130 94608 118260 141912 177390
Hotel B Cost -25000 0 0 0 0 0 0
Cum. Profit -25000 116036 351096 727192 1197312 1761456 2466636
Profit -25000 141036 235060 376096 470120 564144 705180
Revenue 0 141036 235060 376096 470120 564144 705180
Hotel B2 Cost 0 0 0 0 0 0 0
Cum. Profit 0 34711,5 92564 185128 300833 439679 613236,5
Profit 0 34711,5 57852,5 92564 115705 138846 173557,5
Revenue 0 34711,5 57852,5 92564 115705 138846 173557,5
Hotel A1 Cost -25000 0 0 0 0 0 0
Cum. Profit -25000 68622,5 224660 474320 786395 1160885 1628997,5
Profit -25000 93622,5 156037,5 249660 312075 374490 468112,5
Revenue 0 93622,5 156037,5 249660 312075 374490 468112,5
Hotel C Cost -25000 0 0 0 0 0 0
Cum. Profit -25000 54935 188160 401320 667770 987510 1387185
Profit -25000 79935 133225 213160 266450 319740 399675
Revenue 0 79935 133225 213160 266450 319740 399675
It is always possible to sort a MultiIndex using a custom order by handling it as an iterable of tuples, and using the standard sorted
function with a key:
resul = resul.loc[sorted(resul.index, key=lambda x:
(x[0], ['Revenue', 'Cost', 'Profit', 'Cum. Profit'].index(x[1])))]
it then gives:
2022 2023 2024 2025 2026 2027 2028
Hotel Name
Hotel A Revenue 0 35478 59130 94608 118260 141912 177390
Cost 0 0 0 0 0 0 0
Profit 0 35478 59130 94608 118260 141912 177390
Cum. Profit 0 35478 94608 189216 307476 449388 626778
Hotel A1 Revenue 0 93622,5 156037,5 249660 312075 374490 468112,5
Cost -25000 0 0 0 0 0 0
Profit -25000 93622,5 156037,5 249660 312075 374490 468112,5
Cum. Profit -25000 68622,5 224660 474320 786395 1160885 1628997,5
Hotel B Revenue 0 141036 235060 376096 470120 564144 705180
Cost -25000 0 0 0 0 0 0
Profit -25000 141036 235060 376096 470120 564144 705180
Cum. Profit -25000 116036 351096 727192 1197312 1761456 2466636
Hotel B2 Revenue 0 34711,5 57852,5 92564 115705 138846 173557,5
Cost 0 0 0 0 0 0 0
Profit 0 34711,5 57852,5 92564 115705 138846 173557,5
Cum. Profit 0 34711,5 92564 185128 300833 439679 613236,5
Hotel C Revenue 0 79935 133225 213160 266450 319740 399675
Cost -25000 0 0 0 0 0 0
Profit -25000 79935 133225 213160 266450 319740 399675
Cum. Profit -25000 54935 188160 401320 667770 987510 1387185
回答2:
You can create Index
/MultiIndex
by all columns without years in columns names by DataFrame.set_index, then MultiIndex in columns
by Series.str.split, so possible reshape by DataFrame.stack, last set index names and convert MultiIndex in index
to columns by DataFrame.reset_index, then convert Val
column to ordered categorical
by order of values in columns, so you can add DataFrame.sort_values for correct order:
df = pd.read_excel('test.xlsx')
df = df.set_index(['Hotel Name'])
df.columns = df.columns.str.split(n=1, expand=True)
cats = df.columns.get_level_values(1).unique()
print (cats)
Index(['Revenue', 'Cost', 'Profit', 'Cum. Profit'], dtype='object')
df = (df.stack()
.rename_axis(('Hotel Name','Val'))
.reset_index()
.assign(Val = lambda x: pd.Categorical(x.Val, ordered=True, categories=cats))
.sort_values(['Hotel Name','Val'])
)
print (df.head())
Hotel Name Val 2022 2023 2024 2025 2026 2027 \
3 Hotel A Revenue 0 35478.0 59130.0 94608 118260 141912
0 Hotel A Cost 0 0.0 0.0 0 0 0
2 Hotel A Profit 0 35478.0 59130.0 94608 118260 141912
1 Hotel A Cum. Profit 0 35478.0 94608.0 189216 307476 449388
15 Hotel A1 Revenue 0 93622.5 156037.5 249660 312075 374490
2028
3 177390.0
0 0.0
2 177390.0
1 626778.0
15 468112.5
In your solution is necesary change range
to 2029
for include year 2028
:
df = pd.read_excel('test.xlsx')
df = (pd.wide_to_long(df,
stubnames=[str(i) for i in range(2022,2029)],
i='Hotel Name',
j='value',
sep='',
suffix='.+')
.reset_index()
.sort_values('Hotel Name')
.drop('value', axis=1))
print (df.head())
Hotel Name 2022 2023 2024 2025 2026 2027 2028
0 Hotel A 0 35478.0 59130.0 94608 118260 141912 177390.0
5 Hotel A 0 0.0 0.0 0 0 0 0.0
10 Hotel A 0 35478.0 59130.0 94608 118260 141912 177390.0
15 Hotel A 0 35478.0 94608.0 189216 307476 449388 626778.0
3 Hotel A1 0 93622.5 156037.5 249660 312075 374490 468112.5
来源:https://stackoverflow.com/questions/60396686/python-pandas-how-to-convert-my-table-from-a-long-format-to-wide-format-specif