how to transform time codes into turn codes

放肆的年华 提交于 2020-06-17 06:29:37


I want to transform time codes like these


df_time <- tibble(time = c(ymd_hms("2020_01_01 00:00:01"),
                           ymd_hms("2020_01_01 00:00:02"),
                           ymd_hms("2020_01_01 00:00:03"),
                           ymd_hms("2020_01_01 00:00:04"),
                           ymd_hms("2020_01_01 00:00:05"),
                           ymd_hms("2020_01_01 00:00:06")),
                  a = c(0, 1, 1, 1, 1, 0),
                  b = c(0, 0, 1, 1, 0, 0))

resulting in

# A tibble: 6 x 3
  time                    a     b
  <dttm>              <dbl> <dbl>
1 2020-01-01 00:00:01     0     0
2 2020-01-01 00:00:02     1     0
3 2020-01-01 00:00:03     1     1
4 2020-01-01 00:00:04     1     1
5 2020-01-01 00:00:05     1     0
6 2020-01-01 00:00:06     0     0

into turn codes (a.k.a. event codes/"start stop data"). Should look like the following df:

df_turn <- tibble(start = c(ymd_hms("2020_01_01 00:00:02"),
                            ymd_hms("2020_01_01 00:00:03")),
                  end = c(ymd_hms("2020_01_01 00:00:05"),
                          ymd_hms("2020_01_01 00:00:04")),
                  code = c("a", "b"))

> df_turn
# A tibble: 2 x 3
  start               end                 code 
  <dttm>              <dttm>              <chr>
1 2020-01-01 00:00:02 2020-01-01 00:00:05 a    
2 2020-01-01 00:00:03 2020-01-01 00:00:04 b  



One way is to convert your data frame to long and filter out the 0s. Once you do that, you only need the maximum and minimum per group (as per time), so we can do that using slice after we group. The final step is to create a column with start and end and simply convert the resulting data frame to wide format, i.e.


df_time %>% 
 pivot_longer(cols = -1, names_to = 'code') %>% 
 filter(value != 0) %>% 
 group_by(code) %>%
 slice(c(which.min(time), which.max(time))) %>% 
 select(-value) %>% 
 mutate(new = c('start', 'end')) %>% 
 pivot_wider(names_from = new, values_from = time)

which gives,

# A tibble: 2 x 3
# Groups:   name [2]
  code   start               end                
  <chr> <dttm>              <dttm>             
1 a     2020-01-01 00:00:02 2020-01-01 00:00:05
2 b     2020-01-01 00:00:03 2020-01-01 00:00:04 

