Python Pandas: How to convert my table from a long format to wide format (specific example below)?

问题

Pretty much the title. I am attaching the spreadsheet here. I need to convert "Input" sheet to "Output" sheet. I know about Pandas wide_to_long. But I haven't been able to use it to give the desired output, the rows get scrambled up in the output.

import pandas as pd
df=pd.read_excel('../../Downloads/test.xlsx',sheet_name='Input', header=0)
newdf=pd.wide_to_long(df, [str(i) for i in range(2022,2028)], 'Hotel Name', 'value', sep='', suffix='.+')\
  .reset_index()\
  .sort_values('Hotel Name')\
  .drop('value', axis=1)
newdf

The output is

回答1:

I would hide the hotel name in index, then change the columns to a MultiIndex, and stack:

df = pd.read_csv('test.csv', sep=';').set_index('Hotel Name')
df.columns = pd.MultiIndex.from_tuples([name.split(None, 1) for name in df.columns])
resul = df.stack()

it directly gives:

                         2022     2023      2024    2025     2026     2027       2028
Hotel Name                                                                           
Hotel A    Cost             0        0         0       0        0        0          0
           Cum. Profit      0    35478     94608  189216   307476   449388     626778
           Profit           0    35478     59130   94608   118260   141912     177390
           Revenue          0    35478     59130   94608   118260   141912     177390
Hotel B    Cost        -25000        0         0       0        0        0          0
           Cum. Profit -25000   116036    351096  727192  1197312  1761456    2466636
           Profit      -25000   141036    235060  376096   470120   564144     705180
           Revenue          0   141036    235060  376096   470120   564144     705180
Hotel B2   Cost             0        0         0       0        0        0          0
           Cum. Profit      0  34711,5     92564  185128   300833   439679   613236,5
           Profit           0  34711,5   57852,5   92564   115705   138846   173557,5
           Revenue          0  34711,5   57852,5   92564   115705   138846   173557,5
Hotel A1   Cost        -25000        0         0       0        0        0          0
           Cum. Profit -25000  68622,5    224660  474320   786395  1160885  1628997,5
           Profit      -25000  93622,5  156037,5  249660   312075   374490   468112,5
           Revenue          0  93622,5  156037,5  249660   312075   374490   468112,5
Hotel C    Cost        -25000        0         0       0        0        0          0
           Cum. Profit -25000    54935    188160  401320   667770   987510    1387185
           Profit      -25000    79935    133225  213160   266450   319740     399675
           Revenue          0    79935    133225  213160   266450   319740     399675

It is always possible to sort a MultiIndex using a custom order by handling it as an iterable of tuples, and using the standard sorted function with a key:

resul = resul.loc[sorted(resul.index, key=lambda x:
                         (x[0], ['Revenue', 'Cost', 'Profit', 'Cum. Profit'].index(x[1])))]

it then gives:

                         2022     2023      2024    2025     2026     2027       2028
Hotel Name                                                                           
Hotel A    Revenue          0    35478     59130   94608   118260   141912     177390
           Cost             0        0         0       0        0        0          0
           Profit           0    35478     59130   94608   118260   141912     177390
           Cum. Profit      0    35478     94608  189216   307476   449388     626778
Hotel A1   Revenue          0  93622,5  156037,5  249660   312075   374490   468112,5
           Cost        -25000        0         0       0        0        0          0
           Profit      -25000  93622,5  156037,5  249660   312075   374490   468112,5
           Cum. Profit -25000  68622,5    224660  474320   786395  1160885  1628997,5
Hotel B    Revenue          0   141036    235060  376096   470120   564144     705180
           Cost        -25000        0         0       0        0        0          0
           Profit      -25000   141036    235060  376096   470120   564144     705180
           Cum. Profit -25000   116036    351096  727192  1197312  1761456    2466636
Hotel B2   Revenue          0  34711,5   57852,5   92564   115705   138846   173557,5
           Cost             0        0         0       0        0        0          0
           Profit           0  34711,5   57852,5   92564   115705   138846   173557,5
           Cum. Profit      0  34711,5     92564  185128   300833   439679   613236,5
Hotel C    Revenue          0    79935    133225  213160   266450   319740     399675
           Cost        -25000        0         0       0        0        0          0
           Profit      -25000    79935    133225  213160   266450   319740     399675
           Cum. Profit -25000    54935    188160  401320   667770   987510    1387185

回答2:

You can create Index/MultiIndex by all columns without years in columns names by DataFrame.set_index, then MultiIndex in columns by Series.str.split, so possible reshape by DataFrame.stack, last set index names and convert MultiIndex in index to columns by DataFrame.reset_index, then convert Val column to ordered categorical by order of values in columns, so you can add DataFrame.sort_values for correct order:

df = pd.read_excel('test.xlsx')

df = df.set_index(['Hotel Name'])
df.columns = df.columns.str.split(n=1, expand=True)

cats = df.columns.get_level_values(1).unique()
print (cats)
Index(['Revenue', 'Cost', 'Profit', 'Cum. Profit'], dtype='object')

df = (df.stack()
        .rename_axis(('Hotel Name','Val'))
        .reset_index()
        .assign(Val = lambda x: pd.Categorical(x.Val, ordered=True, categories=cats))
        .sort_values(['Hotel Name','Val'])
        )
print (df.head())
   Hotel Name          Val  2022     2023      2024    2025    2026    2027  \
3     Hotel A      Revenue     0  35478.0   59130.0   94608  118260  141912   
0     Hotel A         Cost     0      0.0       0.0       0       0       0   
2     Hotel A       Profit     0  35478.0   59130.0   94608  118260  141912   
1     Hotel A  Cum. Profit     0  35478.0   94608.0  189216  307476  449388   
15   Hotel A1      Revenue     0  93622.5  156037.5  249660  312075  374490   

        2028  
3   177390.0  
0        0.0  
2   177390.0  
1   626778.0  
15  468112.5

In your solution is necesary change range to 2029 for include year 2028:

df = pd.read_excel('test.xlsx')


df = (pd.wide_to_long(df, 
                      stubnames=[str(i) for i in range(2022,2029)],
                      i='Hotel Name', 
                      j='value', 
                      sep='',
                      suffix='.+')
                    .reset_index()
                   .sort_values('Hotel Name')
                   .drop('value', axis=1))
print (df.head())
   Hotel Name  2022     2023      2024    2025    2026    2027      2028
0     Hotel A     0  35478.0   59130.0   94608  118260  141912  177390.0
5     Hotel A     0      0.0       0.0       0       0       0       0.0
10    Hotel A     0  35478.0   59130.0   94608  118260  141912  177390.0
15    Hotel A     0  35478.0   94608.0  189216  307476  449388  626778.0
3    Hotel A1     0  93622.5  156037.5  249660  312075  374490  468112.5

来源：https://stackoverflow.com/questions/60396686/python-pandas-how-to-convert-my-table-from-a-long-format-to-wide-format-specif

标签

python

pandas

dataframe