问题
I have successfully switched for loops to sapply loops before, and I know for a fact (system.time()) that they are faster.
BUT my mind still works in a for loop way...
Please help me to convert this for loop case:
names.list <- c("Anna", "Ana", "Albert", "Albort", "Rob", "Robb", "Tommy", "Tommie")
misspell.list <- c("Anna", "Albort", "Robb", "Tommie")
fix.list <- c("Ana", "Albert", "Rob", "Tommy")
for(i in 1:length(fix.list)) {
names.list[which(names.list == misspell.list[i])] <- fix.list[i]
}
names.list
To a sapply()
So far, I got:
sapply(seq_along(fix.list), function(x)
names.list[which(names.list == misspell.list[x])] <- fix.list[x]
)
But it only returns me the original vector.
Thanks!
EDIT 1:
the misspell.list and fix.list were created automatically by adist() bellow and the original names.list has 665 elements. My for() solution returns length(unique(names.list))
= 653 elements
# will do another sapply() substitution here soon
for(i in 1:(length(names.list)-1)) {
distancias[i] <- adist(names.list[i], names.list[i+1])
}
# fix list
misspell.list <- names.list[which(distancias < 2)]
fix.list <- names.list[which(distancias < 2) +1]
EDIT 2: thanks to you, now I'm a sapply overlord and I'm here just to show my other for-sapply substitution used with adist()
nomes <- sort(unique(names.list))
distancias <- rep(10, length(nomes))
#adist() for finding misspelling
sapply(seq_along(nomes),
function(x) {
if(x<length(nomes)) {
distancias[x] <<- adist(nomes[x], nomes[x+1])
}
}
)
# fix list
misspell.list <- names.list[which(distancias < 2)]
fix.list <- names.list[which(distancias < 2) +1]
The other part you already know, thanks again!
回答1:
The solution using match
is much better, but in terms of what you were trying to do, this will work. Firstly, you don't need the which
. You also need to use the <<-
operator to tell the internal function defined within the loop to use the global environment rather than its own local one - otherwise it does not change names.list
, only its copy.
sapply(seq_along(fix.list), function(x)
names.list[names.list == misspell.list[x]] <<- fix.list[x]
)
names.list
[1] "Ana" "Ana" "Albert" "Albert" "Rob" "Rob" "Tommy" "Tommy"
回答2:
If there is one-to-one correspondence between misspell.list
and fix.list
you can do away with loops by using match
function
names.list[match(misspell.list,names.list)] <- fix.list
names.list
#[1] "Ana" "Ana" "Albert" "Albert" "Rob" "Rob" "Tommy" "Tommy"
回答3:
I would propose a small change to your whole setup. When using indexes like you do, you need to be sure that the order is always the same. If you add or remove a name, the whole thing falls apart.
Using a named list and lapply
or sapply
, your code stays dynamic and you can potentially match multiple misspellings to one name.
misspell.list <- list(
'Anna' = 'Ana',
'Albort' = 'Albert',
'Robb' = 'Rob',
'Tommie' = 'Tommy'
)
names.list <- c("Anna", "Ana", "Albert", "Albort", "Rob", "Robb", "Tommy", "Tommie")
> sapply(names.list,function(x) ifelse(x %in% names(misspell.list),misspell.list[[x]],x))
Anna Ana Albert Albort Rob Robb Tommy Tommie
"Ana" "Ana" "Albert" "Albert" "Rob" "Rob" "Tommy" "Tommy"
To illustrate what I mean, I'm using sample
to shuffle up your names.list
vector and extend it to 20 names. This shows that order and length have no influence.
sapply(names.list[sample(1:length(names.list),20,replace = T)],function(x) ifelse(x %in% names(misspell.list),misspell.list[[x]],x))
Albert Tommie Rob Tommie Rob Tommy Ana Robb Tommie Ana Tommie Albort Ana Albert Albert Albort
"Albert" "Tommy" "Rob" "Tommy" "Rob" "Tommy" "Ana" "Rob" "Tommy" "Ana" "Tommy" "Albert" "Ana" "Albert" "Albert" "Albert"
Tommy Tommy Tommy Ana
"Tommy" "Tommy" "Tommy" "Ana"
来源:https://stackoverflow.com/questions/44907912/r-sapply-loop-to-replace-for-loop