Comparing 2 datasets in R

て烟熏妆下的殇ゞ 提交于 2019-12-24 00:36:00

问题


I have 2 extracted data sets from a dataset called babies2009( 3 vectors count, name, gender )

One is girls2009 containing all the girls and the other boys2009. I want to find out what similar names exist between boys and girls.

I tried this

common.names = (boys2009$name %in% girls2009$name)

When I try

babies2009[common.names, ] [1:10, ]

all I get is the girl names not the common names.

I have confirmed that both data sets indeed contain boys and girls respectively by doing taking a 10 sample...

boys2009 [1:10,]
girsl2009 [1:10,]

How else can I compare the 2 datasets and determine what values they both share. Thanks,


回答1:


common.names = (boys2009$name %in% girls2009$name) gives you a logical vector of length length(boys2009$name). So when you try selecting from a much longer data.frame babies2009[common.names, ] [1:10, ], you wind up with nonsense.

Solution: use that logical vector on the proper data.frame!

boys2009 <- data.frame( names=c("Billy","Bob"),data=runif(2), gender="M" , stringsAsFactors=FALSE)
girls2009 <- data.frame( names=c("Billy","Mae","Sue"),data=runif(3), gender="F" , stringsAsFactors=FALSE)
babies2009 <- rbind(boys2009,girls2009)

common.names <- (boys2009$name %in% girls2009$name)

> boys2009[common.names, ]$names
[1] "Billy"



回答2:


Since you want similarities but did not specify exact matches, you should consider agrep

sapply(boys2009$name , agrep,  girls2009$name, max = 0.1)

You can adjust the max.distance argument to suit your needs.




回答3:


How about using set functions:

list(
    `only boys` = setdiff(boys2009$name, girls2009$name),
    `common` = intersect(boys2009$name, girls2009$name),
    `only girls` = setdiff(girls2009$name, boys2009$name)
)


来源:https://stackoverflow.com/questions/7459138/comparing-2-datasets-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!