Replace NA in column with value in adjacent column

无人久伴 提交于 2019-11-26 02:21:32

问题


This question is related to a post with a similar title (replace NA in an R vector with adjacent values). I would like to scan a column in a data frame and replace NA\'s with the value in the adjacent cell. In the aforementioned post, the solution was to replace the NA not with the value from the adjacent vector (e.g. the adjacent element in the data matrix) but was a conditional replace for a fixed value. Below is a reproducible example of my problem:

UNIT <- c(NA,NA, 200, 200, 200, 200, 200, 300, 300, 300,300)
STATUS <-c(\'ACTIVE\',\'INACTIVE\',\'ACTIVE\',\'ACTIVE\',\'INACTIVE\',\'ACTIVE\',\'INACTIVE\',\'ACTIVE\',\'ACTIVE\',
                    \'ACTIVE\',\'INACTIVE\') 
TERMINATED <- c(\'1999-07-06\' , \'2008-12-05\' , \'2000-08-18\' , \'2000-08-18\' ,\'2000-08-18\' ,\'2008-08-18\',
                        \'2008-08-18\',\'2006-09-19\',\'2006-09-19\' ,\'2006-09-19\' ,\'1999-03-15\') 
START <- c(\'2007-04-23\',\'2008-12-06\',\'2004-06-01\',\'2007-02-01\',\'2008-04-19\',\'2010-11-29\',\'2010-12-30\',
                   \'2007-10-29\',\'2008-02-05\',\'2008-06-30\',\'2009-02-07\')
STOP <- c(\'2008-12-05\',\'4712-12-31\',\'2007-01-31\',\'2008-04-18\',\'2010-11-28\',\'2010-12-29\',\'4712-12-31\',
                  \'2008-02-04\',\'2008-06-29\',\'2009-02-06\',\'4712-12-31\')

TEST <- data.frame(UNIT,STATUS,TERMINATED,START,STOP) TEST

  UNIT   STATUS TERMINATED      START       STOP
1    NA   ACTIVE 1999-07-06 2007-04-23 2008-12-05
2    NA INACTIVE 2008-12-05 2008-12-06 4712-12-31
3   200   ACTIVE 2000-08-18 2004-06-01 2007-01-31
4   200   ACTIVE 2000-08-18 2007-02-01 2008-04-18
5   200 INACTIVE 2000-08-18 2008-04-19 2010-11-28
6   200   ACTIVE 2008-08-18 2010-11-29 2010-12-29
7   200 INACTIVE 2008-08-18 2010-12-30 4712-12-31
8   300   ACTIVE 2006-09-19 2007-10-29 2008-02-04
9   300   ACTIVE 2006-09-19 2008-02-05 2008-06-29
10  300   ACTIVE 2006-09-19 2008-06-30 2009-02-06
11  300 INACTIVE 1999-03-15 2009-02-07 4712-12-31

#using the syntax for a conditional replace and hoping it works :/          
TEST$UNIT[is.na(TEST$UNIT)] <- TEST$STATUS; TEST 

   UNIT   STATUS TERMINATED      START       STOP
1     1   ACTIVE 1999-07-06 2007-04-23 2008-12-05
2     2 INACTIVE 2008-12-05 2008-12-06 4712-12-31
3   200   ACTIVE 2000-08-18 2004-06-01 2007-01-31
4   200   ACTIVE 2000-08-18 2007-02-01 2008-04-18
5   200 INACTIVE 2000-08-18 2008-04-19 2010-11-28
6   200   ACTIVE 2008-08-18 2010-11-29 2010-12-29
7   200 INACTIVE 2008-08-18 2010-12-30 4712-12-31
8   300   ACTIVE 2006-09-19 2007-10-29 2008-02-04
9   300   ACTIVE 2006-09-19 2008-02-05 2008-06-29
10  300   ACTIVE 2006-09-19 2008-06-30 2009-02-06
11  300 INACTIVE 1999-03-15 2009-02-07 4712-12-31

The outcome should be:

      UNIT   STATUS TERMINATED      START       STOP
1   ACTIVE   ACTIVE 1999-07-06 2007-04-23 2008-12-05
2 INACTIVE INACTIVE 2008-12-05 2008-12-06 4712-12-31
3      200   ACTIVE 2000-08-18 2004-06-01 2007-01-31
4      200   ACTIVE 2000-08-18 2007-02-01 2008-04-18
5      200 INACTIVE 2000-08-18 2008-04-19 2010-11-28
6      200   ACTIVE 2008-08-18 2010-11-29 2010-12-29
7      200 INACTIVE 2008-08-18 2010-12-30 4712-12-31
8      300   ACTIVE 2006-09-19 2007-10-29 2008-02-04
9      300   ACTIVE 2006-09-19 2008-02-05 2008-06-29
10     300   ACTIVE 2006-09-19 2008-06-30 2009-02-06
11     300 INACTIVE 1999-03-15 2009-02-07 4712-12-31

回答1:


It didn't work because status was a factor. When you mix factor with numeric then numeric is the least restrictive. By forcing status to be character you get the results you're after and the column is now a character vector:

TEST$UNIT[is.na(TEST$UNIT)] <- as.character(TEST$STATUS[is.na(TEST$UNIT)])

##        UNIT   STATUS TERMINATED      START       STOP
## 1    ACTIVE   ACTIVE 1999-07-06 2007-04-23 2008-12-05
## 2  INACTIVE INACTIVE 2008-12-05 2008-12-06 4712-12-31
## 3       200   ACTIVE 2000-08-18 2004-06-01 2007-01-31
## 4       200   ACTIVE 2000-08-18 2007-02-01 2008-04-18
## 5       200 INACTIVE 2000-08-18 2008-04-19 2010-11-28
## 6       200   ACTIVE 2008-08-18 2010-11-29 2010-12-29
## 7       200 INACTIVE 2008-08-18 2010-12-30 4712-12-31
## 8       300   ACTIVE 2006-09-19 2007-10-29 2008-02-04
## 9       300   ACTIVE 2006-09-19 2008-02-05 2008-06-29
## 10      300   ACTIVE 2006-09-19 2008-06-30 2009-02-06
## 11      300 INACTIVE 1999-03-15 2009-02-07 4712-12-31



回答2:


You have to do

TEST$UNIT[is.na(TEST$UNIT)] <- TEST$STATUS[is.na(TEST$UNIT)]

so that the value will be replaced with the adjacent value. Otherwise there is a mismatch between the number of values to be replaced and the values to replace them with. This would result in the values being replaced in row order. It works in this case because the two values being replaced are the first two.




回答3:


TEST$UNIT = ifelse(is.na(TEST$UNIT), paste(TEST$STATUS),paste(TEST$UNIT));TEST
       UNIT   STATUS TERMINATED      START       STOP
1    ACTIVE   ACTIVE 1999-07-06 2007-04-23 2008-12-05
2  INACTIVE INACTIVE 2008-12-05 2008-12-06 4712-12-31
3       200   ACTIVE 2000-08-18 2004-06-01 2007-01-31
4       200   ACTIVE 2000-08-18 2007-02-01 2008-04-18
5       200 INACTIVE 2000-08-18 2008-04-19 2010-11-28
6       200   ACTIVE 2008-08-18 2010-11-29 2010-12-29
7       200 INACTIVE 2008-08-18 2010-12-30 4712-12-31
8       300   ACTIVE 2006-09-19 2007-10-29 2008-02-04
9       300   ACTIVE 2006-09-19 2008-02-05 2008-06-29
10      300   ACTIVE 2006-09-19 2008-06-30 2009-02-06
11      300 INACTIVE 1999-03-15 2009-02-07 4712-12-31


来源:https://stackoverflow.com/questions/15629885/replace-na-in-column-with-value-in-adjacent-column

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!