问题
This is my first time posting here, I hope my question is clear and appropriate. I have a set of data the head of which looks like this:
wl ex421 wl ex309 wl ex284 wl ex347
1 431 0.6168224 321 0.1267943 301 0.06392694 361 0.15220484
2 432 0.6687435 322 0.2416268 302 0.05631659 362 0.08961593
3 433 0.6583593 323 0.4665072 303 0.05327245 363 0.13134187
4 434 0.6832814 324 0.3576555 304 0.00000000 364 0.32432432
5 435 0.6427830 325 0.2194976 305 0.12328767 365 0.50308203
6 436 0.7393562 326 0.1866029 306 0.08675799 366 0.34660977
and so on. The 'wl' columns represent wavelength, and there are four different ranges. The other four columns represent measurements (normalized) taken over the 'wl' ranges. The ranges are of different lengths, too. All of them partially overlap somewhere in the middle of the dataset. What I need to achieve is a plot showing all four sets of 'ex###' data on the same set of axis and plotted over their respective ranges. The x-axis needs to accommodate all four 'wl' ranges. However, I haven't yet succeeded.
When I had to plot multiple sets of data like this in the past I just melted the data and it always worked. Something like this:
df_melt <- melt(df, id.var = 'wl')
And then I'd plot it like this:
fluor_plt <- ggplot(fluor_ref2_melt, aes(x=wl,y=value,color=variable)) +
geom_point(shape = 1, fill = NA) + geom_path(data = fluor_ref2_melt,size = 1) +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) +
scale_colour_manual(values = colvec)
However, because I have multiple columns with the name 'wl', which also have different ranges, what happens is that R only takes the first 'wl' column and discards all the other ones. It then basically shifts all the 'ex###' values into that range by using the row index... so I get a plot of the frame below:
wl ex421 ex309 ex284 ex347
1 431 0.6168224 0.1267943 0.06392694 0.15220484
2 432 0.6687435 0.2416268 0.05631659 0.08961593
3 433 0.6583593 0.4665072 0.05327245 0.13134187
4 434 0.6832814 0.3576555 0.00000000 0.32432432
5 435 0.6427830 0.2194976 0.12328767 0.50308203
6 436 0.7393562 0.1866029 0.08675799 0.34660977
Needless to say, this is entirely wrong... So one way I tried to circumvent the issue is going into Excel and manually moving columns up and down, so that in the dataframe each row corresponds to one 'wl' value, whether there are any measured values associated with it or not. This got rid of the values being 'shifted', but R still discards the 'wl' columns after the first one. Instead of getting an entirely wrong plot, I get a section of the right one. The first set of observations (ex421) is plotted over its entire range; pieces of the other ones are seen where ranges overlap. I've looked at some similar cases which were asked about here in the past, like this - Reshape data frame from wide to long with re-occuring column names in R. But I'm new to R and I don't think I could fully understand the proposed solutions. I didn't succeed in reshaping my data in the way I want it to be reshaped (keeping different 'wl' ranges for different sets) and I had no idea which arguments to give to ggplot afterwards. I've tried using data.table, but then I don't know what to give it for value.name and variable.name. To reiterate, what I want to achieve is what one would get from plotting the four datasets in the spreadsheet by making a single Scatter plot in Excel and adding four different series to it.
Any input would be greatly appreciated!
回答1:
Here I load a data frame with your data, making sure to allow repeated names with check.names = F
, otherwise it would rename the wl
columns to be distinct:
df <- read.table(
header = T, check.names = F,
stringsAsFactors = F,
text = " wl ex421 wl ex309 wl ex284 wl ex347
431 0.6168224 321 0.1267943 301 0.06392694 361 0.15220484
432 0.6687435 322 0.2416268 302 0.05631659 362 0.08961593
433 0.6583593 323 0.4665072 303 0.05327245 363 0.13134187
434 0.6832814 324 0.3576555 304 0.00000000 364 0.32432432
435 0.6427830 325 0.2194976 305 0.12328767 365 0.50308203
436 0.7393562 326 0.1866029 306 0.08675799 366 0.34660977")
Then here's a way to reshape, by just stacking subsets of the data. Since there weren't too many column pairs, I thought a semi-manual method would be ok. It preserves the distinct column headers so we can gather
those into long form and map to color like in your plot.
library(tidyverse)
df2 <- bind_rows(
df[1:2],
df[3:4],
df[5:6],
df[7:8]
) %>%
gather(variable, value, -wl) %>%
drop_na()
ggplot(df2, aes(x=wl,y=value,color=variable)) +
geom_point(shape = 1, fill = NA) +
geom_path(size = 1) +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank())
回答2:
I can think of this solution:
# data:
dt <- structure(list(wl = 431:436,
ex421 = c(0.6168224, 0.6687435, 0.6583593, 0.6832814, 0.642783, 0.7393562),
wl = 321:326,
ex309 = c(0.1267943, 0.2416268, 0.4665072, 0.3576555, 0.2194976, 0.1866029),
wl = 301:306,
ex284 = c(0.06392694, 0.05631659, 0.05327245, 0, 0.12328767, 0.08675799),
wl = 361:366,
ex347 = c(0.15220484, 0.08961593, 0.13134187, 0.32432432, 0.50308203, 0.34660977)),
row.names = c(NA, -6L),
class = c("data.table", "data.frame"))
# get vectors with wl names
wls <- grep("wl", names(dt))
# get vectors with ex_numbers names
exs <- grep("ex", names(dt))
# reformat the data:
newDt <- cbind(stack(dt, select = wls), stack(dt, select = exs))
# Assign reasonable names:
names(newDt) <- c("wlNumber", "wlInd", "exValue", "exNumber")
Now the data is ready to be plotted with any command:
ggplot(newDt, aes(x = wlNumber, y = exValue, color = exNumber))+geom_point()+geom_line()
The main advantage of this approach is that you can have the table spread into many columns. It doesn't matter, as long as their name has "wl" on it ("ex" for the other variables).
来源:https://stackoverflow.com/questions/56605354/how-to-melt-and-plot-multiple-datasets-over-different-ranges-on-the-same-set-of