dplyr table reconstructing/data wrangling

﹥>﹥吖頭↗ 提交于 2019-12-25 00:18:02

问题


I'm trying to create a variable that defines true vs false searches. The original dataset is located here: https://github.com/wikimedia-research/Discovery-Hiring-Analyst-2016/blob/master/events_log.csv.gz

The basic scenario is that there are variables that define how many times a user (defined by ID- either session_id or uuid in the original dataset) performs a true search vs a false search, such that a visit is always preceded by a search, but a search does not have to be followed by a visit. If you check the original dataset there is also a time variable, timestamp, that I do not know how to replicate but I believe will be useful.

A sketchy version of the original structure:

ID  Action   Time
a   search    1
a   visit     2
a   search    3
a   visit     4
b   visit     2
b   visit     3
b   search    1
c   search    5
c   search    6
c   search    7
c   visit     8
d   search    3
d   search    4

I'm trying to create a variable that defines true vs false searches. The above data is expected to be sorted by Action = search only such as in the following format:

Structure I'm trying to produce:

ID  Action ClickThrough
a   search    T
a   search    T
b   search    T
c   search    F
c   search    F
c   search    T
d   search    F
d   search    F

回答1:


This produces the expected output using dplyr

library(dplyr)
df1 %>%
  arrange(ID,Time) %>%
  group_by(ID) %>%
  mutate(ClickThrough = c(as.logical(diff(Action=="visit")),FALSE)) %>%
  filter(Action=="search")

# # A tibble: 8 x 4
# # Groups:   ID [4]
#      ID Action  Time ClickThrough
#   <chr>  <chr> <int>        <lgl>
# 1     a search     1         TRUE
# 2     a search     3         TRUE
# 3     b search     1         TRUE
# 4     c search     5        FALSE
# 5     c search     6        FALSE
# 6     c search     7         TRUE
# 7     d search     3        FALSE
# 8     d search     4        FALSE


来源:https://stackoverflow.com/questions/48800381/dplyr-table-reconstructing-data-wrangling

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!