R binning dataset and surface plot

与世无争的帅哥 提交于 2019-12-24 13:52:33

问题


I have a large data set that I am trying to discretise and create a 3d surface plot with:

  rowColFoVCell wpbCount Feret

1  001001001001       1  0.58

2  001001001001       1  1.30

3  001001001001       1  0.58

4  001001001001       1  0.23

5  001001001001       2  0.23

6  001001001001       2  0.58

There are currently 695302 rows in this data set. I am trying to discretise the third 'Feret' column based on the second column, so for each 'wpbCount' bin the 'Feret' column.

I think the solution will involve using cut but I am not sure how to go about this. I would like to end up with a data frame something like this:

  wpbCount Feret Count

1  1  [0.0,0.2] 3

2  1  [0.2,0.4] 5

3  1  [0.4,0.6] 6

4  1  [0.8,0.8] 9

5  2  [0.0,0.2] 6

6  2  [0.4,0.6] 23

回答1:


This is to answer the first part:

Create Some data

DF <- data.frame(wpbCount = sample(1:1000, 1000),
                 Feret = sample(seq(0, 1, 0.001), 1000))

1) Discretize Use cut with right = FALSE so the intervals are [) I normally find this more usefull than the default

DF$cut_it <- cut(DF$Feret, right = FALSE,
                 breaks = c(0, 0.2, 0.4, 0.6, 0.8, 1))

2) Aggregate
TABLE <- data.frame(table(DT$cut_it))

EDIT Another attempt

library(data.table)
DT <- data.table(DF)
DT <- DT[, list(wpbCount = length(wpbCount),
                Feret = length(Feret)
                ), by=cut_it]

Perhaps you are just trying to discretize and not aggregate. Try this:

DF2 <- data.frame(wpbCount = sample(1:3, 1000, replace=T),
                 Feret = sample(seq(0, 1, 0.001), 1000))

DF2$Feret2 <- cut(DF$Feret, right = FALSE,
                 breaks = c(0, 0.2, 0.4, 0.6, 0.8, 1.1))

DF2 <- DF2[, c(1, 3)]



回答2:


Thanks very much for your help I used the following functions in R:

x$bin <- cut(x$Feret, right = FALSE, breaks = seq(0,max(wpbFeatures$Feret), by=0.1))

y <-aggregate(x$bin, by = x[c('wpbCount', 'bin')], length)

From your suggestions I have been able to get the data frame that I require:

wpbCount | bin | x

1 [0.2,0.3) 72

2 [0.2,0.3) 142

3 [0.2,0.3) 224

4 [0.2,0.3) 299

5 [0.2,0.3) 421

6 [0.2,0.3) 479

Now I need to plot this in 3D and I am not sure how to do so with a non-numerical column i.e. the bin column which is factors.

Does anyone know how I can plot these three columns against each other?




回答3:


Check out this link. There are some 3d plots. However, 3d plots aren't the greatest tool to analize data. If you insist with the 3d approach, try stat_contout() from the ggplot2 package.

However, a probably better apprach is to do a few plots in 2d, or use facet_grid(). Take a look at ggplot2 current documentation also.

Try this based on your last answer (not tested):

ggplot(DF, aes(wpbCount , x)) +
  geon_point() +
  facet_grid(. ~ bin)

The idea is to use the factor variable (in this case, bin) to facet the plot.



来源:https://stackoverflow.com/questions/20851362/r-binning-dataset-and-surface-plot

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!