Tidying datasets with multiple sections/headers at variable positions

前端 未结 4 1613
轻奢々
轻奢々 2021-01-22 16:33

Context

I am trying to read in and tidy an excel file with multiple headers/sections placed at variable positions. The content of these headers need to

4条回答
  •  天涯浪人
    2021-01-22 17:29

    Assuming you have a finite list of measures (diesel, electric, etc), you can make a list to check against. Any value of col1 not in that set of measures is presumably a city. Extract those (note that it's currently a factor, so I used as.character), fill down, and remove any heading rows.

    library(dplyr)
    
    meas <- c("Diesel", "Gasoline", "LPG", "Electric")
    
    df %>%
      mutate(city = ifelse(!col1 %in% meas, as.character(col1), NA)) %>%
      tidyr::fill(city) %>%
      filter(col1 != city)
    #>       col1 col2    city
    #> 1   Diesel   80 Seattle
    #> 2 Gasoline   NA Seattle
    #> 3      LPG   10 Seattle
    #> 4 Electric   10 Seattle
    #> 5   Diesel   65  Boston
    #> 6 Gasoline   25  Boston
    #> 7 Electric   10  Boston
    

提交回复
热议问题