Replace a value in a data frame based on a conditional (`if`) statement

问题

In the R data frame coded for below, I would like to replace all of the times that B appears with b.

junk <- data.frame(x <- rep(LETTERS[1:4], 3), y <- letters[1:12])
colnames(junk) <- c(\"nm\", \"val\")

this provides:

   nm val
1   A   a
2   B   b
3   C   c
4   D   d
5   A   e
6   B   f
7   C   g
8   D   h
9   A   i
10  B   j
11  C   k
12  D   l

My initial attempt was to use a for and if statements like so:

for(i in junk$nm) if(i %in% \"B\") junk$nm <- \"b\"

but as I am sure you can see, this replaces ALL of the values of junk$nm with b. I can see why this is doing this but I can\'t seem to get it to replace only those cases of junk$nm where the original value was B.

NOTE: I managed to solve the problem with gsub but in the interest of learning R I still would like to know how to get my original approach to work (if it is possible)

回答1:

Easier to convert nm to characters and then make the change:

junk$nm <- as.character(junk$nm)
junk$nm[junk$nm == "B"] <- "b"

EDIT: And if indeed you need to maintain nm as factors, add this in the end:

junk$nm <- as.factor(junk$nm)

回答2:

another useful way to replace values

library(plyr)
junk$nm <- revalue(junk$nm, c("B"="b"))

回答3:

Short answer is:

junk$nm[junk$nm %in% "B"] <- "b"

Take a look at Index vectors in R Introduction (if you don't read it yet).

EDIT. As noticed in comments this solution works for character vectors so fail on your data.

For factor best way is to change level:

levels(junk$nm)[levels(junk$nm)=="B"] <- "b"

回答4:

As the data you show are factors, it complicates things a little bit. @diliop's Answer approaches the problem by converting to nm to a character variable. To get back to the original factors a further step is required.

An alternative is to manipulate the levels of the factor in place.

> lev <- with(junk, levels(nm))
> lev[lev == "B"] <- "b"
> junk2 <- within(junk, levels(nm) <- lev)
> junk2
   nm val
1   A   a
2   b   b
3   C   c
4   D   d
5   A   e
6   b   f
7   C   g
8   D   h
9   A   i
10  b   j
11  C   k
12  D   l

That is quite simple and I often forget that there is a replacement function for levels().

Edit: As noted by @Seth in the comments, this can be done in a one-liner, without loss of clarity:

within(junk, levels(nm)[levels(nm) == "B"] <- "b")

回答5:

The easiest way to do this in one command is to use which command and also need not to change the factors into character by doing this:

junk$nm[which(junk$nm=="B")]<-"b"

回答6:

You have created a factor variable in nm so you either need to avoid doing so or add an additional level to the factor attributes. You should also avoid using <- in the arguments to data.frame()

Option 1:

junk <- data.frame(x = rep(LETTERS[1:4], 3), y =letters[1:12], stringsAsFactors=FALSE)
junk$nm[junk$nm == "B"] <- "b"

Option 2:

levels(junk$nm) <- c(levels(junk$nm), "b")
junk$nm[junk$nm == "B"] <- "b"
junk

回答7:

If you are working with character variables (note that stringsAsFactors is false here) you can use replace:

junk <- data.frame(x <- rep(LETTERS[1:4], 3), y <- letters[1:12], stringsAsFactors = FALSE)
colnames(junk) <- c("nm", "val")

junk$nm <- replace(junk$nm, junk$nm == "B", "b")
junk
#    nm val
# 1   A   a
# 2   b   b
# 3   C   c
# 4   D   d
# ...

回答8:

stata.replace<-function(data,replacevar,replacevalue,ifs) {
  ifs=parse(text=ifs)
  yy=as.numeric(eval(ifs,data,parent.frame()))
  x=sum(yy)
  data=cbind(data,yy)
  data[yy==1,replacevar]=replacevalue
  message=noquote(paste0(x, " replacement are made"))
  print(message)
  return(data[,1:(ncol(data)-1)])
}

Call this function using below line.

d=stata.replace(d,"under20",1,"age<20")

来源：https://stackoverflow.com/questions/5824173/replace-a-value-in-a-data-frame-based-on-a-conditional-if-statement

标签

recode