问题
This is an extension of the question that I asked here: Getting Factor Means into the dataset after calculation
Now that I have basically normalized all of the stats that I am interested in using I want to search the data set for people that intersect with these. Thus I am searching the dataset like this:
base3[((base3$ScaledAVG>2)&(base3$ScaledOBP>2)&(base3$ScaledK.AB<.20)),]
looking for the players that have all three of those things true, yet when I run this it resets the Scaled K.AB value to either .5, 1 or 2 and then doesn't search using that parameter. Is there something wrong with searching the data set this way or is there a better way to find people in a dataset in this same vein?
Here is some sample data but it doesn't have the same problems as when I go out to the 4000 records I have:
AVG = c(.350,.400,.320,.220,.100,.250,.400,.450)
Conf = c("SEC","ACC","SEC","B12","P12","ACC","B12","P12")
OBP = c(.360,.420,.360,.260,.160,.260,.460,.410)
K.AB = c(.11,.10,.09,.25,.20,.19,.05,.09)
Conf=as.factor(Conf)
d<- data.frame(Conf, AVG,OBP,K.AB)
dd <- do.call(rbind, by(d, d$Conf, FUN=function(x) { x$Scaled <- scale(x$AVG); x}))
dd <- do.call(rbind, by(d, d$Conf, FUN=function(x) { x$Scaled <- scale(x$OBP); x}))
dd <- do.call(rbind, by(d, d$Conf, FUN=function(x) { x$Scaled <- scale(x$K.AB); x}))
dd[((dd$ScaledAVG>2)&(dd$ScaledOBP>2)&(dd$ScaledK.AB<.20)),]
Thank you!
回答1:
You may want to drop the do.call(rbind, by(...)) strategy in favor of a straight scale
strategy. The scale function has a
data.frame` method.
> dd <- scale(d[ ,c("AVG", "OBP", "K.AB")])
> dd
AVG OBP K.AB
[1,] 0.33566727 0.2348519 -0.3608439
[2,] 0.76878633 0.8281619 -0.5051815
[3,] 0.07579584 0.2348519 -0.6495191
[4,] -0.79044229 -0.7539981 1.6598820
[5,] -1.82992803 -1.7428481 0.9381942
[6,] -0.53057085 -0.7539981 0.7938566
[7,] 0.76878633 1.2237019 -1.2268693
[8,] 1.20190539 0.7292769 -0.6495191
attr(,"scaled:center")
AVG OBP K.AB
0.31125 0.33625 0.13500
attr(,"scaled:scale")
AVG OBP K.AB
0.11544170 0.10112757 0.06928203
> d[ dd[, 'AVG'] > 2 & dd[ ,'OBP'] >2 & dd[ ,'K.AB'] < 0.2 , ]
[1] Conf AVG OBP K.AB
<0 rows> (or 0-length row.names)
It should not be too surprising that you get no rows that meet all of those conditions since a scaled value of 2 is rather unlikely in a small dataset.
To apply scale within levels of Conf:
> dd <- lapply(d[ ,c("AVG", "OBP", "K.AB")], function(x) ave(x, d[,"Conf"] , FUN=scale) )
> dd
$AVG
[1] 0.7071068 0.7071068 -0.7071068 -0.7071068 -0.7071068 -0.7071068 0.7071068 0.7071068
$OBP
[1] NaN 0.7071068 NaN -0.7071068 -0.7071068 -0.7071068 0.7071068 0.7071068
$K.AB
[1] 0.7071068 -0.7071068 -0.7071068 0.7071068 0.7071068 0.7071068 -0.7071068 -0.7071068
> data.frame(dd)
AVG OBP K.AB
1 0.7071068 NaN 0.7071068
2 0.7071068 0.7071068 -0.7071068
3 -0.7071068 NaN -0.7071068
4 -0.7071068 -0.7071068 0.7071068
5 -0.7071068 -0.7071068 0.7071068
6 -0.7071068 -0.7071068 0.7071068
7 0.7071068 0.7071068 -0.7071068
8 0.7071068 0.7071068 -0.7071068
I do not think it works too well here because the offered test case is too small.
来源:https://stackoverflow.com/questions/15593537/using-do-call-factor-to-scale-resetting-value-error