Reshape multiple value columns to wide format

前端 未结 5 456
难免孤独
难免孤独 2020-11-22 11:32

I have the following data frame and i want to use cast to create a \"pivot table\" with columns for two values (value and percent). Here is the data frame:

         


        
相关标签:
5条回答
  • 2020-11-22 11:59

    Your best option is to reshape your data to long format, using melt, and then to dcast:

    library(reshape2)
    
    meltExpensesByMonth <- melt(expensesByMonth, id.vars=1:2)
    dcast(meltExpensesByMonth, expense_type ~ month + variable, fun.aggregate = sum)
    

    The first few lines of output:

                 expense_type 2012-02-01_value 2012-02-01_percent 2012-03-01_value 2012-03-01_percent
    1              Adjustment           442.37        0.124025031             2.00       0.0005064625
    2     Bank Service Charge           200.00        0.056072985           200.00       0.0506462461
    3                   Cable            21.33        0.005980184            36.33       0.0091998906
    4                 Charity             0.00        0.000000000             0.00       0.0000000000
    
    0 讨论(0)
  • 2020-11-22 12:07

    I prefer the tabulate function in package tables for this. It requires factors, but this is anyway a good idea with the type of data you have.

    library(tables)
    expensesByMonth$month= as.factor(expensesByMonth$month)
    expensesByMonth$expense_type= as.factor(expensesByMonth$expense_type)
    tabular(expense_type~(month)*(value+percent)*(sum),data=expensesByMonth)
    # Optional formatting
    tabular(expense_type~month*
       ((Format(digits=1))*value+(Format(digits=3))*percent)*sum,
       data=expensesByMonth)
    

    Partial output:

                           value      percent  value      percent  value      percent 
    expense_type           sum        sum      sum        sum      sum        sum     
    Adjustment              442       0.124025    2       0.000506   16       0.003573
    Bank Service Charge     200       0.056073  200       0.050646  200       0.043650
    Cable                    21       0.005980   36       0.009200    0       0.000000
    
    0 讨论(0)
  • 2020-11-22 12:23

    Reshaping from long to wide format with multiple value/measure columns is now possible with the new function pivot_wider() introduced in tidyr 1.0.0.

    This is superior to the previous tidyr strategy of gather() than spread(), because the attributes are no longer dropped (e.g., dates remain dates, strings remain strings).

    pivot_wider() (counterpart: pivot_longer()) works similar to spread(). However, it offers additional functionality such as multiple value columns. To this end, the argument values_from—that indicates from which column(s) the values are taken—may take more than one column name.

    NAs may be filled using the argument values_fill.

    library("tidyr")
    library("magrittr")
    
    pivot_wider(expensesByMonth, 
                id_cols = expense_type,
                names_from = month,
                values_from = c(value, percent))
    #> # A tibble: 23 x 13
    #>    expense_type `value_2012-02-~ `value_2012-03-~ `value_2012-04-~
    #>    <chr>                   <dbl>            <dbl>            <dbl>
    #>  1 Adjustment              442.               2               16.4
    #>  2 Bank Servic~            200              200              200  
    #>  3 Cable                    21.3             36.3             NA  
    #>  4 Clubbing                 75              207.             325. 
    #>  5 Dining                   22.5             74.5             80.5
    #>  6 Education              1800               NA               NA  
    #>  7 Gifts                    10               89              100  
    #>  8 Groceries               233.             373.             398. 
    #>  9 Lunch                   155.             384.             326. 
    #> 10 Personal Ca~             30               30               90  
    #> # ... with 13 more rows, and 9 more variables: `value_2012-05-01` <dbl>,
    #> #   `value_2012-06-01` <dbl>, `value_2012-07-01` <dbl>,
    #> #   `percent_2012-02-01` <dbl>, `percent_2012-03-01` <dbl>,
    #> #   `percent_2012-04-01` <dbl>, `percent_2012-05-01` <dbl>,
    #> #   `percent_2012-06-01` <dbl>, `percent_2012-07-01` <dbl>
    

    Alternatively, the reshape may be done using a pivot spec that offers finer control (see link below):

    # see also ?build_wider_spec
    spec <- expensesByMonth %>%
        expand(month, .value = c("percent", "value")) %>%
        dplyr::mutate(.name = paste(.$month, .$.value, sep = "_"))
    pivot_wider_spec(expensesByMonth, spec = spec)
    

    Created on 2019-03-26 by the reprex package (v0.2.1)

    See also: https://tidyr.tidyverse.org/dev/articles/pivot.html

    0 讨论(0)
  • 2020-11-22 12:25

    data.table can cast on multiple value.var variables. This is quite direct (and efficient).

    Therefore:

    library(data.table) # v1.9.5+
    dcast(setDT(expensesByMonth), expense_type ~ month, value.var = c("value", "percent"))
    
    0 讨论(0)
  • 2020-11-22 12:25

    As this question is often visited, it deserves a complete base R answer too in my opinion. The reshape-function from base R is quite versatile and can easily be applied to this problem as well:

    expenses <- reshape(expensesByMonth, idvar = 'expense_type', direction = 'wide',
                        timevar = 'month', sep = '_')
    

    The cells with NA-values can be replaced with 0 with:

    expenses[is.na(expenses)] <- 0
    

    which gives (ordered by expense_type to make it easier to compare with the desired output):

    > expenses[order(expenses$expense_type),]
                 expense_type value_2012-02-01 percent_2012-02-01 value_2012-03-01 percent_2012-03-01 value_2012-04-01 percent_2012-04-01 value_2012-05-01 percent_2012-05-01 value_2012-06-01 percent_2012-06-01 value_2012-07-01 percent_2012-07-01
    1              Adjustment           442.37        0.124025031             2.00       0.0005064625            16.37        0.003572769             0.00        0.000000000            10.00        0.002490443             0.00        0.000000000
    2     Bank Service Charge           200.00        0.056072985           200.00       0.0506462461           200.00        0.043650205           200.00        0.049672410           200.00        0.049808859             0.00        0.000000000
    3                   Cable            21.33        0.005980184            36.33       0.0091998906             0.00        0.000000000            39.05        0.009698538            16.00        0.003984709             0.00        0.000000000
    67                Charity             0.00        0.000000000             0.00       0.0000000000             0.00        0.000000000             0.00        0.000000000            32.59        0.008116353             0.00        0.000000000
    30                Clothes             0.00        0.000000000             0.00       0.0000000000           806.90        0.176106751           237.00        0.058861806           149.81        0.037309325             0.00        0.000000000
    4                Clubbing            75.00        0.021027369           206.55       0.0523049107           324.81        0.070890115            40.00        0.009934482             0.00        0.000000000             0.00        0.000000000
    32               Computer             0.00        0.000000000             0.00       0.0000000000           756.00        0.164997774           283.83        0.070492601           100.00        0.024904429            10.54        0.003417573
    5                  Dining            22.50        0.006308211            74.50       0.0188657267            80.50        0.017569207           141.32        0.035098525            80.00        0.019923543             0.00        0.000000000
    6               Education          1800.00        0.504656861             0.00       0.0000000000             0.00        0.000000000             0.00        0.000000000            60.00        0.014942658             0.00        0.000000000
    52               Electric             0.00        0.000000000             0.00       0.0000000000             0.00        0.000000000            32.88        0.008166144            31.91        0.007947003             0.00        0.000000000
    7                   Gifts            10.00        0.002803649            89.00       0.0225375795           100.00        0.021825102            30.00        0.007450862            55.00        0.013697436            10.00        0.003242479
    8               Groceries           233.33        0.065417547           372.68       0.0943742150           398.37        0.086944660           424.40        0.105404855           397.25        0.098932845           342.11        0.110928451
    9                   Lunch           154.75        0.043386472           383.75       0.0971774847           326.25        0.071204396           412.00        0.102325166           486.40        0.121135144           291.00        0.094356141
    37            Maintenance             0.00        0.000000000             0.00       0.0000000000           151.00        0.032955905           142.75        0.035453683           115.60        0.028789520            76.50        0.024804965
    21       Medical Expenses             0.00        0.000000000           144.19       0.0365134111            29.95        0.006536618            86.55        0.021495736            47.08        0.011725005            66.80        0.021659760
    22          Miscellaneous             0.00        0.000000000           508.11       0.1286693205           101.00        0.022043353          1051.50        0.261152698          1000.00        0.249044293          1008.00        0.326841890
    10          Personal Care            30.00        0.008410948            30.00       0.0075969369            90.00        0.019642592            30.00        0.007450862           120.00        0.029885315            30.00        0.009727437
    24                  Phone             0.00        0.000000000            38.40       0.0097240793            38.45        0.008391752            38.90        0.009661284            41.11        0.010238211            41.11        0.013329831
    25             Recreation             0.00        0.000000000            81.75       0.0207016531            61.00        0.013313312            51.50        0.012790646           256.00        0.063755339           316.00        0.102462339
    11                   Rent           545.00        0.152798883          1746.70       0.4423189903           743.75        0.162324199           749.70        0.186197031           761.60        0.189672133           765.00        0.248049649
    95 Repair and Maintenance             0.00        0.000000000             0.00       0.0000000000             0.00        0.000000000             0.00        0.000000000             0.00        0.000000000            65.00        0.021076114
    12         Transportation            32.50        0.009111860            35.00       0.0088630931           129.00        0.028154382            35.00        0.008692672            55.00        0.013697436            62.00        0.020103370
    45                 Travel             0.00        0.000000000             0.00       0.0000000000           228.53        0.049876906             0.00        0.000000000             0.00        0.000000000             0.00        0.000000000
    

    You could also achieve this with the tidyverse:

    library(dplyr)
    library(tidyr)
    
    expensesByMonth %>% 
      gather(k, v, 3:4) %>% 
      unite(km, k, month) %>% 
      spread(km, v, fill = 0)
    
    0 讨论(0)
提交回复
热议问题