How to subtract a median only from integer value

我们两清 提交于 2019-12-25 00:07:32

问题


I have this dataset

    df=structure(list(Dt = structure(1:39, .Label = c("2018-02-20 00:00:00.000", 
"2018-02-21 00:00:00.000", "2018-02-22 00:00:00.000", "2018-02-23 00:00:00.000", 
"2018-02-24 00:00:00.000", "2018-02-25 00:00:00.000", "2018-02-26 00:00:00.000", 
"2018-02-27 00:00:00.000", "2018-02-28 00:00:00.000", "2018-03-01 00:00:00.000", 
"2018-03-02 00:00:00.000", "2018-03-03 00:00:00.000", "2018-03-04 00:00:00.000", 
"2018-03-05 00:00:00.000", "2018-03-06 00:00:00.000", "2018-03-07 00:00:00.000", 
"2018-03-08 00:00:00.000", "2018-03-09 00:00:00.000", "2018-03-10 00:00:00.000", 
"2018-03-11 00:00:00.000", "2018-03-12 00:00:00.000", "2018-03-13 00:00:00.000", 
"2018-03-14 00:00:00.000", "2018-03-15 00:00:00.000", "2018-03-16 00:00:00.000", 
"2018-03-17 00:00:00.000", "2018-03-18 00:00:00.000", "2018-03-19 00:00:00.000", 
"2018-03-20 00:00:00.000", "2018-03-21 00:00:00.000", "2018-03-22 00:00:00.000", 
"2018-03-23 00:00:00.000", "2018-03-24 00:00:00.000", "2018-03-25 00:00:00.000", 
"2018-03-26 00:00:00.000", "2018-03-27 00:00:00.000", "2018-03-28 00:00:00.000", 
"2018-03-29 00:00:00.000", "2018-03-30 00:00:00.000"), class = "factor"), 
    ItemRelation = c(158043L, 158043L, 158043L, 158043L, 158043L, 
    158043L, 158043L, 158043L, 158043L, 158043L, 158043L, 158043L, 
    158043L, 158043L, 158043L, 158043L, 158043L, 158043L, 158043L, 
    158043L, 158043L, 158043L, 158043L, 158043L, 158043L, 158043L, 
    158043L, 158043L, 158043L, 158043L, 158043L, 158043L, 158043L, 
    158043L, 158043L, 158043L, 158043L, 158043L, 158043L), stuff = c(200L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 3600L, 0L, 0L, 0L, 0L, 
    700L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1000L, 
    2600L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 700L), num = c(1459L, 
    1459L, 1459L, 1459L, 1459L, 1459L, 1459L, 1459L, 1459L, 1459L, 
    1459L, 1459L, 1459L, 1459L, 1459L, 1459L, 1459L, 1459L, 1459L, 
    1459L, 1459L, 1459L, 1459L, 1459L, 1459L, 1459L, 1459L, 1459L, 
    1459L, 1459L, 1459L, 1459L, 1459L, 1459L, 1459L, 1459L, 1459L, 
    1459L, 1459L), year = c(2018L, 2018L, 2018L, 2018L, 2018L, 
    2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 
    2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 
    2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 
    2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L), action = c(0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L)), .Names = c("Dt", "ItemRelation", 
"stuff", "num", "year", "action"), class = "data.frame", row.names = c(NA, 
-39L))

The action column has only two values 0 and 1. i must calculate median by stuff for 1 category of action, then median by stuff of zero category of action, using last five integer values before one category. I just take the last 5 observations, It is necessary to take the last 5 observations in the zero category of action, but only the integer values in our case this is

200
3600
700
1000
2600

then substract median of zero category from median of one category.

The number of observations by stuff in the zero category of action can vary from 0-10. If we have 10 integer values of zero category, we take last five. If there is only 1,2,3,4,5 values integer, we subtract median of real number of integer values. If we have only 0 without integer , we just substact 0.

How to do it?

edit expected output

Dt                  ItemRelation    DocumentNum DocumentYear    value
2018-03-30 00:00:00.000 158043        1459       2018          -300

*-300=(700-median(   200,
    3600,
    700,
    1000,
    2600)

note, if for 1 category of action by stuff there is one value(in our case it is 700) we don't calculate median, we work only with this value

if there are two values, we calculate median for 1 category by stuff


回答1:


df.0 <- df %>% filter(action == 0 & stuff != 0) %>% arrange(Dt) %>% top_n(5)
df.1 <- df %>% filter(action==1 & stuff!=0)

new.df <- rbind(df.0,df.1)


View(
  df %>% select (everything()) %>%  group_by(ItemRelation, num, year) %>%
    summarise(
      median.1 = median(stuff[action == 1 & stuff != 0], na.rm = T),
      median.0 = median(stuff[action == 0 &
                                stuff != 0], na.rm = T)
    ) %>%
    mutate(
      value = median.1 - median.0,
      DocumentNum = num,
      DocumentYear = year
    ) %>%
    select(ItemRelation, DocumentNum, DocumentYear, value)

Does that help?



来源:https://stackoverflow.com/questions/50908897/how-to-subtract-a-median-only-from-integer-value

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!