x <- iris[,1:4]
names(x) <- c(\"x1\",\"x2\",\"x3\",\"x4\")
aggregate(x1+x2+x3+x4~x1,FUN=sum,data=x)
Here is the output ,i wonder
1. What
~
in aggregate()
separates, to the left side, what is being "aggregated", and to the right side, what is being used to "aggregate" the items.
In your example, the result of x1 + x2 + x3 + x4
will be calculated for each row, and then summed up according to the group formed by tuples
in which x1
appears with the same value.
So, the reason why you have 8.5
is because, the data being summed is:
x1 + x2 + x3 + x4 = sum(c(4.3, 3.0, 1.1, 0.1)) = 8.5
The line with x1 = 4.3
, in your example, is the 14th row: 14 4.3 3.0 1.1 0.1
.
The values are all summed up, and each resultant sum is aggregated by x1
value, and sent to FUN=sum
for being summed.
Since there's only one x1 = 4.3
, the value will be simply 8.5
, which is the result of the sum of entries from row 14th.
Regarding your second question, in the Iris data set there is only one row where the first column is 4.3. That row is:
(x[x[,1]==4.3,])
# x1 x2 x3 x4
# 14 4.3 3 1.1 0.1
# and 4.3 + 3.0 + 1.1 + 0.1 = 8.5.
sum(x[x[,1]==4.3,])
# [1] 8.5
# There are four rows where x1 = 6.9. Those rows are:
x[x[,1]==6.9,]
# x1 x2 x3 x4
# 53 6.9 3.1 4.9 1.5
# 121 6.9 3.2 5.7 2.3
# 140 6.9 3.1 5.4 2.1
# 142 6.9 3.1 5.1 2.3
# and
# 6.9 + 3.1 + 4.9 + 1.5 +
# 6.9 + 3.2 + 5.7 + 2.3 +
# 6.9 + 3.1 + 5.4 + 2.1 +
# 6.9 + 3.1 + 5.1 + 2.3 = 69.4
sum(x[x[,1]==6.9,])
# [1] 69.4
Regarding your new question, I think
transform(x,x1=sort(x1))
is only sorting the first column and the other columns remain unchanged, in which case you are changing the data set.
4.3+3.5+1.4+0.2=9.4
# x1 x2 x3 x4
# 1 4.3 3.5 1.4 0.2
# 2 4.4 3.0 1.4 0.2
# 3 4.4 3.2 1.3 0.2
# 4 4.4 3.1 1.5 0.2
# 5 4.5 3.6 1.4 0.2
If you want to order the data set by increasing values of the first column without changing the data set use:
x[order(x$x1),]
# x1 x2 x3 x4
# 14 4.3 3.0 1.1 0.1
# 9 4.4 2.9 1.4 0.2
# 39 4.4 3.0 1.3 0.2
# 43 4.4 3.2 1.3 0.2
# 42 4.5 2.3 1.3 0.3
The tilde operator creates a symbolic formula. Here's an excerpt from a blog post that explains it better than I could:
f = price ~ carat
[...]
We start by creating the formula f using the strange looking tilde operator. That tells the R interpreter that we're defining a symbolic formula, rather than an expression to be evaluated immediately. So, our definition of formula f says, "price is a function of carat".
The manual page on formulas has more to say about the tilde operator.