Tidying datasets with multiple sections/headers at variable positions

前端 未结 4 1612
轻奢々
轻奢々 2021-01-22 16:33

Context

I am trying to read in and tidy an excel file with multiple headers/sections placed at variable positions. The content of these headers need to

4条回答
  •  盖世英雄少女心
    2021-01-22 17:09

    For completeness' sake, here's a base R solution that also depends on the expectation that you can make a vector of the elements of col1 that are not city names and use it for reference:

    # make your vector of non-city elements of col1 for reference
    types <- c("Diesel","Gasoline","LPG","Electric")
    
    # use that reference vector to flag city names
    df$city = ifelse(!df$col1 %in% types, 1, 0)
    # use cumsum with that flag to create a group id
    df$group = cumsum(df$city) 
    
    # use the split/apply/combine approach, splitting on that group id, restructuring
    # each element of the resulting list as desired through lapply, then recombining 
    # the results with do.call and rbind
    newdf <- do.call(rbind, lapply(split(df, df$group), function(x) {
    
      data.frame(city = x$col1[1], type = x$col1, value = x$col2, stringsAsFactors = FALSE)[-1,]
    
    }))
    

    Result:

    > newdf
           city     type value
    1.2 Seattle   Diesel    80
    1.3 Seattle Gasoline    NA
    1.4 Seattle      LPG    10
    1.5 Seattle Electric    10
    2.2  Boston   Diesel    65
    2.3  Boston Gasoline    25
    2.4  Boston Electric    10
    

提交回复
热议问题