问题
I have the following sample data:
Hostname Date-Time hdisk86 hdisk88 hdisk90 hdisk89 hdisk91 hdisk92 hdisk93 hdisk94 hdisk96 hdisk95
1: hostname1 2015-01-26 00:15:22 0 0 0 0 0 0 0 0 0 0
2: hostname1 2015-01-26 00:30:24 0 0 0 0 0 0 0 0 0 0
3: hostname1 2015-01-26 00:45:25 0 0 0 0 0 0 0 0 0 0
4: hostname1 2015-01-26 01:00:25 0 0 0 0 0 0 0 0 0 0
5: hostname1 2015-01-26 01:15:28 0 0 0 0 0 0 0 0 0 0
6: hostname1 2015-01-26 01:30:29 0 0 0 0 0 0 0 0 0 0
hdisk98 hdisk97 hdisk99 hdisk100 hdisk101 hdisk102 hdisk103 hdisk108 hdisk107 hdisk104 hdisk105 hdisk109 hdisk110
1: 0 0 0 0 0 0 0 0 0 0 0 0 0
2: 0 0 0 0 0 0 0 0 0 0 0 0 0
3: 0 0 0 0 0 0 0 0 0 0 0 0 0
4: 0 0 0 0 0 0 0 0 0 0 0 0 0
5: 0 0 0 0 0 0 0 0 0 0 0 0 0
6: 0 0 0 0 0 0 0 0 0 0 0 0 0
hdisk112 hdisk111 hdisk113 hdisk114 hdisk115 hdisk116 hdisk117 hdisk87 hdisk118 hdisk120 hdisk119 hdisk122
1: 0 0 0 0 0 0 0 0 0 0 0 0
2: 0 0 0 0 0 0 0 0 0 0 0 0
3: 0 0 0 0 0 0 0 0 0 0 0 0
4: 0 0 0 0 0 0 0 0 0 0 0 0
5: 0 0 0 0 0 0 0 0 0 0 0 0
6: 0 0 0 0 0 0 0 0 0 0 0 0
hdisk123 hdisk124 hdisk125 hdisk121 hdisk127 hdisk126 hdisk2 hdisk3 hdisk5 hdisk4 hdisk6 hdisk10 hdisk11 hdisk8
1: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
hdisk12 hdisk9 hdisk18 hdisk14 hdisk15 hdisk17 hdisk16 hdisk13 hdisk106 hdisk19 hdisk20 hdisk7 hdisk21 hdisk28
1: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
hdisk33 hdisk32 hdisk27 hdisk30 hdisk23 hdisk35 hdisk40 hdisk25 hdisk41 hdisk39 hdisk38 hdisk43 hdisk22 hdisk36
1: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
hdisk31 hdisk45 hdisk29 hdisk44 hdisk34 hdisk37 hdisk48 hdisk24 hdisk47 hdisk42 hdisk46 hdisk49 hdisk53 hdisk50
1: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
hdisk56 hdisk55 hdisk54 hdisk52 hdisk59 hdisk62 hdisk58 hdisk64 hdisk61 hdisk65 hdisk60 hdisk67 hdisk66 hdisk57
1: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
hdisk51 hdisk69 hdisk63 hdisk74 hdisk70 hdisk72 hdisk75 hdisk68 hdisk73 hdisk76 hdisk71 hdisk78 hdisk85 hdisk81
1: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
hdisk80 hdisk83 hdisk79 hdisk82 hdisk77 hdisk84 hdisk26 hdisk0 hdisk1 hdisk128 hdisk129 hdisk130 hdisk131 hdisk132
1: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
hdisk133 hdisk134 hdisk135 hdisk136 hdisk137 hdisk138 hdisk139 hdisk140 hdisk141 hdisk142 hdisk143 hdisk144
1: 0 0 0 0 0 0 0 0 0 0 0 0
2: 0 0 0 0 0 0 0 0 0 0 0 0
3: 0 0 0 0 0 0 0 0 0 0 0 0
4: 0 0 0 0 0 0 0 0 0 0 0 0
5: 0 0 0 0 0 0 0 0 0 0 0 0
6: 0 0 0 0 0 0 0 0 0 0 0 0
hdisk145 hdisk146 hdisk147 hdisk148 hdisk149
1: 0 0 0 0 0
2: 0 0 0 0 0
3: 0 0 0 0 0
4: 0 0 0 0 0
5: 0 0 0 0 0
6: 0 0 0 0 0
What I'm trying to do is to take the mean, weighted.mean, and max values of each hdisk column, transpose this data to then sort by weighted.mean, max and mean. Then transpose back to plot in a bar chart. Here we go... First taking the summary info (mean, weighted.mean, and max):
# Creating summary of I/O data (avg, wavg, max)...
c <- grep( "hdisk", names(DISKAVGRIO))
b <- c("Avg", "WAvg", "Max")
wavg = function(x) {
wavg.return <- weighted.mean(x, x)
if (is.nan(wavg.return)) {
return(0)
} else {
return(wavg.return)
}
}
my.summary = function(x) list(avg = mean(x), wavg = wavg(x), max = as.numeric(max(x)))
DT <- DISKAVGRIO[, lapply(.SD, my.summary), .SDcols=c]
DT[, `summary` := list("Avg", "WAvg", "Max")]
setcolorder(DT, c("summary", setdiff(names(DT), "summary")))
Them I have the following data table:
summary hdisk86 hdisk88 hdisk90 hdisk89 hdisk91 hdisk92 hdisk93 hdisk94 hdisk96 hdisk95 hdisk98 hdisk97 hdisk99
1: Avg 0 0 0 0 0 0 0 0 0 0 0 0 0
2: WAvg 0 0 0 0 0 0 0 0 0 0 0 0 0
3: Max 0 0 0 0 0 0 0 0 0 0 0 0 0
hdisk100 hdisk101 hdisk102 hdisk103 hdisk108 hdisk107 hdisk104 hdisk105 hdisk109 hdisk110 hdisk112 hdisk111
1: 0 0 0 0 0 0 0 0 0 0 0 0
2: 0 0 0 0 0 0 0 0 0 0 0 0
3: 0 0 0 0 0 0 0 0 0 0 0 0
hdisk113 hdisk114 hdisk115 hdisk116 hdisk117 hdisk87 hdisk118 hdisk120 hdisk119 hdisk122 hdisk123 hdisk124
1: 0 0 0 0 0 0 0 0 0 0 0 0
2: 0 0 0 0 0 0 0 0 0 0 0 0
3: 0 0 0 0 0 0 0 0 0 0 0 0
hdisk125 hdisk121 hdisk127 hdisk126 hdisk2 hdisk3 hdisk5 hdisk4 hdisk6 hdisk10 hdisk11 hdisk8 hdisk12 hdisk9
1: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
hdisk18 hdisk14 hdisk15 hdisk17 hdisk16 hdisk13 hdisk106 hdisk19 hdisk20 hdisk7 hdisk21 hdisk28 hdisk33 hdisk32
1: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
hdisk27 hdisk30 hdisk23 hdisk35 hdisk40 hdisk25 hdisk41 hdisk39 hdisk38 hdisk43 hdisk22 hdisk36 hdisk31 hdisk45
1: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
hdisk29 hdisk44 hdisk34 hdisk37 hdisk48 hdisk24 hdisk47 hdisk42 hdisk46 hdisk49 hdisk53 hdisk50 hdisk56 hdisk55
1: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
hdisk54 hdisk52 hdisk59 hdisk62 hdisk58 hdisk64 hdisk61 hdisk65 hdisk60 hdisk67 hdisk66 hdisk57 hdisk51 hdisk69
1: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
hdisk63 hdisk74 hdisk70 hdisk72 hdisk75 hdisk68 hdisk73 hdisk76 hdisk71 hdisk78 hdisk85 hdisk81 hdisk80 hdisk83
1: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
hdisk79 hdisk82 hdisk77 hdisk84 hdisk26 hdisk0 hdisk1 hdisk128 hdisk129 hdisk130 hdisk131 hdisk132 hdisk133
1: 0 0 0 0 0 0 0 0 0 0 0 0 0
2: 0 0 0 0 0 0 0 0 0 0 0 0 0
3: 0 0 0 0 0 0 0 0 0 0 0 0 0
hdisk134 hdisk135 hdisk136 hdisk137 hdisk138 hdisk139 hdisk140 hdisk141 hdisk142 hdisk143 hdisk144 hdisk145
1: 0 0 0 0 0 0 0 0 0 0 0 0
2: 0 0 0 0 0 0 0 0 0 0 0 0
3: 0 0 0 0 0 0 0 0 0 0 0 0
hdisk146 hdisk147 hdisk148 hdisk149
1: 0 0 0 0
2: 0 0 0 0
3: 0 0 0 0
Then I transform from wide to long:
# Converting from wide to long...
d <- grep("hdisk", names(DT), value = T)
DT_mdf <- melt(DT,
id.vars="summary",
measure.vars=d,
variable.name="hdisks",
value.name="percentage")
And get the following data table:
summary hdisks percentage
1: Avg hdisk86 0
2: WAvg hdisk86 0
3: Max hdisk86 0
4: Avg hdisk88 0
5: WAvg hdisk88 0
---
446: WAvg hdisk148 0
447: Max hdisk148 0
448: Avg hdisk149 0
449: WAvg hdisk149 0
450: Max hdisk149 0
Then, I try to transpose:
# Transpose to sort by wavg...
DT3 <- dcast(DT_mdf, summary ~ hdisks)
And I get the error message:
Using percentage as value column: use value.var to override.
Error in sort.int(x, na.last = na.last, decreasing = decreasing, ...) :
'x' must be atomic
If I try to set value.var = percentage I get the following error message:
Error in match(x, table, nomatch = 0L) :
'match' requires vector arguments
Why this is not working? Aparently it suppose to work. Somebody has any idea?
回答1:
Your function returns a list
, and using lapply()
on each column therefore results in each cell of the aggregated result as a list as well. You should be able to check this by looking at the class of all the columns. dcast()
is looking for atomic type.
It's much more straightforward to get to your final result by using c()
instead of list()
in this case (note tested due to lack of MRE):
summary.funs = c("mean", "wavg", "max")
my.summary = function(x) c(mean(x), wavg(x), as.numeric(max(x)))
DT <- DISKAVGRIO[, lapply(.SD, my.summary), .SDcols=c][, summary := summary.funs]
should get you the result in the final format.
The Introduction to data.table vignette explains how to efficiently use j
to get the data in the format you desire.
Also of use might be the Efficient reshaping using data.tables vignette.
For updates on vignettes, bookmark/check the Getting started page on project wiki. Also keep an eye on issue #944 and the CRAN data.table page for vignettes corresponding the current version.
来源:https://stackoverflow.com/questions/32937776/how-to-reshape-data-table-after-applying-multiple-functions-to-multiple-variable