问题
I have a large data set that I am trying to discretise and create a 3d surface plot with:
rowColFoVCell wpbCount Feret
1 001001001001 1 0.58
2 001001001001 1 1.30
3 001001001001 1 0.58
4 001001001001 1 0.23
5 001001001001 2 0.23
6 001001001001 2 0.58
There are currently 695302 rows in this data set. I am trying to discretise the third 'Feret' column based on the second column, so for each 'wpbCount' bin the 'Feret' column.
I think the solution will involve using cut but I am not sure how to go about this. I would like to end up with a data frame something like this:
wpbCount Feret Count
1 1 [0.0,0.2] 3
2 1 [0.2,0.4] 5
3 1 [0.4,0.6] 6
4 1 [0.8,0.8] 9
5 2 [0.0,0.2] 6
6 2 [0.4,0.6] 23
回答1:
This is to answer the first part:
Create Some data
DF <- data.frame(wpbCount = sample(1:1000, 1000),
Feret = sample(seq(0, 1, 0.001), 1000))
1) Discretize Use cut with right = FALSE so the intervals are [) I normally find this more usefull than the default
DF$cut_it <- cut(DF$Feret, right = FALSE,
breaks = c(0, 0.2, 0.4, 0.6, 0.8, 1))
2) Aggregate
TABLE <- data.frame(table(DT$cut_it))
EDIT Another attempt
library(data.table)
DT <- data.table(DF)
DT <- DT[, list(wpbCount = length(wpbCount),
Feret = length(Feret)
), by=cut_it]
Perhaps you are just trying to discretize and not aggregate. Try this:
DF2 <- data.frame(wpbCount = sample(1:3, 1000, replace=T),
Feret = sample(seq(0, 1, 0.001), 1000))
DF2$Feret2 <- cut(DF$Feret, right = FALSE,
breaks = c(0, 0.2, 0.4, 0.6, 0.8, 1.1))
DF2 <- DF2[, c(1, 3)]
回答2:
Thanks very much for your help I used the following functions in R:
x$bin <- cut(x$Feret, right = FALSE, breaks = seq(0,max(wpbFeatures$Feret), by=0.1))
y <-aggregate(x$bin, by = x[c('wpbCount', 'bin')], length)
From your suggestions I have been able to get the data frame that I require:
wpbCount | bin | x
1 [0.2,0.3) 72
2 [0.2,0.3) 142
3 [0.2,0.3) 224
4 [0.2,0.3) 299
5 [0.2,0.3) 421
6 [0.2,0.3) 479
Now I need to plot this in 3D and I am not sure how to do so with a non-numerical column i.e. the bin column which is factors.
Does anyone know how I can plot these three columns against each other?
回答3:
Check out this link. There are some 3d plots. However, 3d plots aren't the greatest tool to analize data. If you insist with the 3d approach, try stat_contout() from the ggplot2 package.
However, a probably better apprach is to do a few plots in 2d, or use facet_grid(). Take a look at ggplot2 current documentation also.
Try this based on your last answer (not tested):
ggplot(DF, aes(wpbCount , x)) +
geon_point() +
facet_grid(. ~ bin)
The idea is to use the factor variable (in this case, bin) to facet the plot.
来源:https://stackoverflow.com/questions/20851362/r-binning-dataset-and-surface-plot