问题
This question already has an answer here:
- R aggregate data in one column based on 2 other columns 1 answer
I have these data that has two columns. As you can see in the graph, the data has too much noise. So, I want to discretize column "r" with size 5, and assign each row to its corresponding bin, then calculate the average of f for each bin.
> dr
r f
1 65.06919 21.796
2 62.36986 22.836
3 59.81639 22.980
4 57.42822 22.061
5 55.22681 21.012
6 53.23533 21.274
7 51.47815 21.594
8 49.98000 22.117
9 48.76474 20.366
10 47.85394 18.991
11 47.26521 20.920
12 47.01064 20.161
13 47.09565 22.328
14 47.51842 19.610
15 48.27007 18.615
16 49.33559 21.753
17 50.69517 22.754
18 52.32590 22.096
19 54.20332 22.020
20 56.30275 22.111
21 58.60034 21.395
22 61.07373 22.635
23 63.70243 22.128
24 66.46804 21.698
25 62.24147 21.879
26 59.41380 21.637
27 56.72742 21.991
28 54.20332 21.535
29 51.86521 21.093
30 49.73932 20.496
31 47.85394 21.737
32 46.23851 21.890
33 44.92215 21.236
34 43.93177 19.997
35 43.28972 19.661
36 43.01163 20.692
37 43.10452 19.663
38 43.56604 19.273
39 44.38468 20.743
40 45.54119 22.604
41 47.01064 22.167
42 48.76474 20.427
43 50.77401 21.543
44 53.00943 21.391
45 55.44367 21.313
46 58.05170 22.501
47 60.81118 22.414
48 63.70243 22.920
49 59.54830 21.571
50 56.58622 22.454
51 53.75872 22.643
52 51.08816 20.219
53 48.60041 20.300
54 46.32494 19.832
55 44.29447 20.284
56 42.54409 21.284
57 41.10961 21.350
58 40.02499 20.784
59 39.31921 20.383
60 39.01282 20.508
61 39.11521 19.413
62 39.62323 20.043
63 40.52160 18.583
64 41.78516 19.512
65 43.38202 20.849
66 45.27693 21.349
67 47.43416 20.734
68 49.81967 22.055
69 52.40229 22.108
70 55.15433 23.184
71 58.05170 23.147
72 61.07373 23.207
73 57.00877 21.467
74 53.90733 21.549
75 50.93133 23.035
76 48.10405 20.684
77 45.45327 20.189
78 43.01163 19.304
79 40.81666 19.739
80 38.91015 20.976
81 37.33631 21.305
82 36.13862 21.319
83 35.35534 20.133
84 35.01428 20.179
85 35.12834 20.634
86 35.69314 22.478
87 36.68787 21.608
88 38.07887 20.964
89 39.82462 18.409
90 41.88078 20.627
91 44.20407 20.980
92 46.75468 22.206
93 49.49747 21.828
94 52.40229 20.844
95 55.44367 21.619
96 58.60034 21.498
97 54.64430 19.433
98 51.40039 21.293
99 48.27007 20.687
100 45.27693 21.377
101 42.44997 21.282
102 39.82462 20.910
103 37.44329 18.810
104 35.35534 21.223
105 33.61547 20.197
106 32.28002 20.765
107 31.40064 19.781
108 31.01612 20.536
109 31.14482 21.245
110 31.78050 21.117
111 32.89377 20.303
112 34.43835 20.795
113 36.35932 20.754
114 38.60052 21.025
115 41.10961 20.924
116 43.84062 21.475
117 46.75468 21.435
118 49.81967 20.380
119 53.00943 21.590
120 56.30275 20.743
121 52.47857 20.600
122 49.09175 20.818
123 45.80393 21.514
124 42.63801 21.922
125 39.62323 21.469
126 36.79674 22.186
127 34.20526 19.625
128 31.90611 19.703
129 29.96665 18.793
130 28.46050 18.912
131 27.45906 19.239
132 27.01851 18.467
133 27.16616 18.974
134 27.89265 20.090
135 29.15476 19.155
136 30.88689 20.526
137 33.01515 20.273
138 35.46830 19.956
139 38.18377 21.547
140 41.10961 21.260
141 44.20407 20.802
142 47.43416 19.719
143 50.77401 21.645
144 54.20332 18.957
145 50.53712 21.410
146 47.01064 20.536
147 43.56604 20.963
148 40.22437 20.775
149 37.01351 22.257
150 33.97058 21.868
151 31.14482 18.907
152 28.60070 19.644
153 26.41969 17.694
154 24.69818 17.883
155 23.53720 17.975
156 23.02173 18.778
157 23.19483 18.896
158 24.04163 19.561
159 25.49510 20.137
160 27.45906 19.922
161 29.83287 19.574
162 32.52691 19.029
163 35.46830 20.356
164 38.60052 20.330
165 41.88078 20.005
166 45.27693 20.006
167 48.76474 21.056
168 52.32590 20.143
169 48.84670 22.094
170 45.18849 21.252
171 41.59327 22.023
172 38.07887 21.563
173 34.66987 21.408
174 31.40064 21.334
175 28.31960 19.855
176 25.49510 18.648
177 23.02173 17.397
178 21.02380 17.311
179 19.64688 16.714
180 19.02630 18.152
181 19.23538 18.187
182 20.24846 19.910
183 21.95450 20.451
184 24.20744 19.820
185 26.87006 19.862
186 29.83287 19.987
187 33.01515 19.363
188 36.35932 19.498
189 39.82462 19.121
190 43.38202 20.479
191 47.01064 20.311
192 50.69517 21.666
193 47.43416 21.995
194 43.65776 23.158
195 39.92493 24.632
196 36.24914 23.273
197 32.64966 22.535
198 29.15476 19.933
199 25.80698 18.277
200 22.67157 16.169
So, to walk trhough the procedure, looking at each row starting from row 1 would be assigned to bin [65-70], row 2 would be on in [60-65] ...
then for the final result, I want the middle point of each bin and the average of its f values. S, with that I can draw a line for f as a function of f(r)
回答1:
Alternatively, you can use the wonderful plyr
package.
library(plyr)
ddply(df, .(cut(df$r, 5)), colwise(mean))
However, if you have to ask a question like the above, you are just fine with the tapply
solution.
回答2:
As @Fernando already mentioned in his comment you could try cut
(binning) and tapply
:
tapply(df$f, cut(df$r, seq(15, 70, by=5)), mean)
# (15,20] (20,25] (25,30] (30,35] (35,40] (40,45] (45,50] (50,55] (55,60] (60,65] (65,70]
#17.68433 18.55918 19.28683 20.49000 20.87942 20.65430 20.96155 21.35146 21.92259 22.57414 21.74700
来源:https://stackoverflow.com/questions/18364679/r-calculate-the-average-of-one-column-corresponding-to-each-bin-of-another-colum