问题
I have a data like
author_id paper_id confirmed author_name1 author_affiliation1 author_name
826 25733 1 Emanuele Buratti Genetic engineering Emanuele Buratti
826 25733 1 Emanuele Buratti International center Emanuele Buratti
826 47276 1 Emanuele Buratti Emanuele Buratti
826 77012 1 Emanuele Buratti Emanuele Buratti
826 77012 1 Emanuele Buratti Emanuele Buratti
826 79468 1 Emanuele Buratti Emanuele Buratti
author_affiliation
Genetic enginereing
The International Centre for Genetic Engineering and Biotechnology, Padriciano 66,
Trieste, Italy
International Centre for Genetic Engineering and Biotechnology, Padriciano 99, 34149
Trieste, Italy
Now I have to check for each row strindist between author_name and author_name1(name_dist) and the stringdist between author_affiliation vs author_affiliation1(aff_sit.
I am using
name_dist<-vector()
aff_dist<-vector()
for(i in 1:nrow(mer1))
{
name_dist[i]<-stringdist(mer1$author_name1[i],mer1$author_name[i],method="lv")
aff_dist[i]<-stringdist(mer1$author_affiliation1[i],mer1$author_affiliation[i],method="lv")
}
But this is using a lot of time.How could this be done efficiently?
Thanks
回答1:
You can directly vectorize it
i=1:nrow(mer1)
name_dist<-stringdist(mer1$author_name1[i],mer1$author_name[i],method="lv")
aff_dist<-stringdist(mer1$author_affiliation1[i],mer1$author_affiliation[i],method="lv")
回答2:
You can use sapply
(or some other vectorization method), like so:
a = letters[1:5] # your mer1$author_name1
b = LETTERS[1:5] # your mer1$author_name
name_dist = sapply(a, stringdist, b, method="lv")
回答3:
Try
res <- transform(mer1,
name_dist=stringdist(author_name1,author_name,method="lv"),
aff_dist=stringdist(author_affiliation1,author_affiliation,method="lv")
)
Since stringdist
is a function capable of vector input, it should be more efficient this way.
来源:https://stackoverflow.com/questions/22609199/efficient-programming-in-r