How to Split Dataset and plot in R

梦想的初衷 提交于 2020-06-26 04:46:04

问题


I am using a data set like:

1  48434  14566
1  56711  6289
1  58826  4174
2  56626  6374
2  58888  4112
2  59549  3451
2  60020  2980
2  60468  2532
3  56586  6414
3  58691  4309
3  59360  3640
3  59941  3059
.
.
.
10  56757  6243
10  58895  4105
10  59565  3435
10  60120  2880
10  60634  2366

I need a plot in R of 3rd column for each value of first column i.e. for above data there would be 10 different plots of (each group 1-10) of values of 3rd column. x-axis is number of Iterations and Y-axis is the values with max 63000. I also need to connect the dots with a line in color red. I am new to R and have been reading documentation but that confused me more. could any body plz help.

EDIT: I actually want line graph of V3 values. the number of rows of v3 column would be on x-axis and v3 values on y-axis. And I want different graphs each for a group indicated by v1. Chase's solution works except that I want the axis shifted, the V3 values should be on y-axis.here is example alt text

EDIT2: @Roman, Here is the code I am executing.

library(lattice)
d <- read.delim("c:\\proj58\\positions23.txt",sep="")
d <- do.call(rbind, lapply(split(d, d$V1), function(x) {
    x$iterations <- order(x$V3, decreasing=TRUE)
    x
}))
xyplot(V3 ~ iterations | V1, type="l", data=d)

This is the error I get,

    > 
>  source("C:\\proj58\\plots2.R")
> d
       V1    V2    V3 iterations
1.1     1 48434 14566          1
1.2     1 56711  6289          2
1.3     1 58826  4174          3
1.4     1 59528  3472          4

I am not getting any plot?? what am I missing OK: Got It. don't know what was wrong. Here it is,

alt text

2 more things, how to change V1 labels on the boxes to actual numbers like 1,2,... secondly I have files that contain 100 groups, I tried one and it made all graphs on a single page (unreadable obviously), can I make these on more than one windows?


回答1:


Well, first you need to create a variable with the row number, for each subset of the first variable separately. Here's one way to do it, by splitting the data set by the first variable, making a new variable that has the row number, and recombining.

You also probably want V1 to be a factor (a categorical variable).

d <- do.call(rbind, lapply(split(d, d$V1), function(x) {
    x$iterations <- 1:nrow(x)
    x
}))
d$V1 <- factor(d$V1)

Then using the lattice library, you'd do something like

xyplot(V3 ~ iterations | V1, type="l", data=d)

To make the plots appear on more than one page, limit the number of plots on a page using the layout option. You'll need to save the plot to a file that supports multi-page output to do that. For example, for 5 rows and 5 columns:

trellis.device("pdf", file="myplot.pdf")
p <- xyplot(V3 ~ iterations | V1, type="l", data=d, layout=c(5,5))
plot(p)
dev.off()

Also, to make the plot appear when running the code using source, you need to specifically plot the output from the xyplot command, like

p <- xyplot(...)
plot(p)

When running at the console, this is not necessary as the plot (well, actually, the print function) is called on it by default.




回答2:


Like Chase said, please clarify on your question so that we can envision better what you're trying to achieve. To add to the heap of confusion, here's a lattice ballpark solution of what I think you may be after.

library(lattice)
fdt <- data.frame(col1 = seq(from = 1, to = 10, each = 10),
        col2 = round(56 * rnorm(100, mean = 30, sd = 5)),
        col3 = round(20 * rnorm(100, mean = 11,)))
xyplot(col3 ~ 1:100 | col1, data = fdt)

alt text




回答3:


I'm not exactly following what it is that you want to plot, but here's an approach that should get your down the right path and you can fill in the appropriate plotting command...or clarify your question and explain what the final result of your plot should look like in more detail.

We are going to take advantage of two packages: plyr and ggplot2. We will use plyr to split up your data into the appropriate groups and then use ggplot2 for the actual plotting. We'll take advantage of the pdf() function and put a different plot on each page.

library(ggplot2)
library(psych)    #For copying in data, not needed beyond that.

df <- read.clipboard(header = F)

pdf("test.pdf")
    d_ply(df, "V1", function(x)     #Split on the first column
        print(qplot(x$V3))          #Your plotting command should go here. This plots histograms.
    )
dev.off()                           #Close the plotting device.

This will generate an n page PDF where n represents the number of groups in V1 (your splitting column). If you'd rather have JPEG outputs, look at ?jpeg or the other graphics options for making other outputs.

EDIT: As you can see, people interpreted your question in a few ways. If @Roman's solution is more what you want, here's roughly the same ggplot code

qplot(col2, col3, data = fdt, geom = "point") + facet_wrap(~ col1 , nrow = 2)


来源:https://stackoverflow.com/questions/4751651/how-to-split-dataset-and-plot-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!