问题
I tried using the reshape package to reshape a dataframe I got, but when using it, numbers in the dataframe are changed which should not be.
The dataframe contains several variables as well as multiple times these variables have been measured, for each person there are 6 rows, that is 6 times that person has been measured. Now I want to reshape the dataframe so there is only one row for each person instead of 6, that means every variable should be there 6 times (once for every measurement), this should easily be done with the following code:
melteddata <- melt(daten, id=(c("IDParticipant", "looporder")))
datenrestrukturiert <- dcast(melteddata, IDParticipant~looporder+variable)
with "daten" being the original dataframe, "looporder" being the variable that reflects the time of measurement (1-6), here an example (unfortunately I could not figure out how to post tables):
https://www.dropbox.com/s/8c9dm4rttedbzw1/daten.jpg?dl=0
or maybe this is fine:
structure(list(IDParticipant = c(1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 3L, 3L, 3L), looporder = c(1L, 2L, 3L, 5L, 6L, 2L, 3L,
5L, 6L, 1L, 2L, 3L), pc_mean_1 = c(NA, 3.22222222222222, NA,
3.22222222222222, 3.22222222222222, 3.66666666666667, 3.66666666666667,
3.66666666666667, 3.66666666666667, 3.25, NA, 3.25), bd_mean_1 = c(NA,
2.88888888888889, NA, 2.88888888888889, 2.88888888888889, 2.75,
2.75, 2.75, 2.75, 4.08333333333333, NA, 4.08333333333333), sm = c(999,
4, 999, 3.66666666666667, 1, 4, 4, 5, 5, 5, 999, 5), cm = c(999,
1.33333333333333, 999, 2.33333333333333, 1, 2, 2, 2.33333333333333,
1, 3, 999, 1.66666666666667)), .Names = c("IDParticipant", "looporder",
"pc_mean_1", "bd_mean_1", "sm", "cm"), row.names = c(NA, 12L), class = "data.frame")
datenrestrukturiert looks as the following:
https://www.dropbox.com/s/al93lnj76y1j266/datenrestrukturiert.jpg?dl=0
I do not want to aggregate or anything, which is why I tried adding fun.aggregate = NULL
without any change, also there is always the following message:
"Aggregation function missing: defaulting to length"
so far everything worked, but there is one problem: when using dcast (as well as cast) some numbers from variables are changed, mostly to "0" or "1", but usually there should be some other numbers like "3.44" or "4.77" or something like that, but they are changed to mostly "0" when cast is computed
Anybody got any hints why this could be?
Some more information that might help: when i import the dataset via read.csv2 I always get a strange name for the first variable, that is some more symbols in front of the variablename than shown in Excel: "ï..IDParticipant" which I rename to "IDParticipant", could that have anything to do with it?
another sidefact: running it with the sampleframe I provided, everything is fine, the original dataframe consists of 1404 rows and 353 variables, could it be too big for R?
回答1:
If you have duplicated combinations of your LHS and RHS variables, then you either need to (1) create a secondary level of IDs, or (2) perform some form of aggregation.
You can test for duplicates by using any(duplicated(...))
.
Here's an example, using your existing sample of "daten" (which does not contain duplicates):
library(reshape2)
idvars <- c("IDParticipant", "looporder")
any(duplicated(daten[idvars]))
# [1] FALSE
melteddata <- melt(daten, id=idvars)
datenrestrukturiert <- dcast(melteddata, IDParticipant ~ looporder + variable)
datenrestrukturiert
# IDParticipant 1_pc_mean_1 1_bd_mean_1 1_sm 1_cm 2_pc_mean_1 2_bd_mean_1 2_sm 2_cm 3_pc_mean_1
# 1 1 NA NA 999 999 3.222222 2.888889 4 1.333333 NA
# 2 2 NA NA NA NA 3.666667 2.750000 4 2.000000 3.666667
# 3 3 3.25 4.083333 5 3 NA NA 999 999.000000 3.250000
# 3_bd_mean_1 3_sm 3_cm 5_pc_mean_1 5_bd_mean_1 5_sm 5_cm 6_pc_mean_1 6_bd_mean_1 6_sm
# 1 NA 999 999.000000 3.222222 2.888889 3.666667 2.333333 3.222222 2.888889 1
# 2 2.750000 4 2.000000 3.666667 2.750000 5.000000 2.333333 3.666667 2.750000 5
# 3 4.083333 5 1.666667 NA NA NA NA NA NA NA
# 6_cm
# 1 1
# 2 1
# 3 NA
However, since any(duplicated(...))
is giving you TRUE
, you are likely to have something more similar to:
daten2 <- rbind(daten, daten[c(1, 4, 6), ])
any(duplicated(daten2[idvars]))
# [1] TRUE
In this case, you can consider using getanID
from my "splitstackshape" package to conveniently add a secondary "id" to your dataset.
library(splitstackshape)
melteddata2 <- melt(getanID(daten2, idvars), c(".id", idvars))
datenrestrukturiert2 <- dcast.data.table(
melteddata2, .id + IDParticipant ~ looporder + variable)
datenrestrukturiert2
# .id IDParticipant 1_pc_mean_1 1_bd_mean_1 1_sm 1_cm 2_pc_mean_1 2_bd_mean_1 2_sm
# 1: 1 1 NA NA 999 999 3.222222 2.888889 4
# 2: 1 2 NA NA NA NA 3.666667 2.750000 4
# 3: 1 3 3.25 4.083333 5 3 NA NA 999
# 4: 2 1 NA NA 999 999 NA NA NA
# 5: 2 2 NA NA NA NA 3.666667 2.750000 4
# 2_cm 3_pc_mean_1 3_bd_mean_1 3_sm 3_cm 5_pc_mean_1 5_bd_mean_1 5_sm
# 1: 1.333333 NA NA 999 999.000000 3.222222 2.888889 3.666667
# 2: 2.000000 3.666667 2.750000 4 2.000000 3.666667 2.750000 5.000000
# 3: 999.000000 3.250000 4.083333 5 1.666667 NA NA NA
# 4: NA NA NA NA NA 3.222222 2.888889 3.666667
# 5: 2.000000 NA NA NA NA NA NA NA
# 5_cm 6_pc_mean_1 6_bd_mean_1 6_sm 6_cm
# 1: 2.333333 3.222222 2.888889 1 1
# 2: 2.333333 3.666667 2.750000 5 1
# 3: NA NA NA NA NA
# 4: 2.333333 NA NA NA NA
# 5: NA NA NA NA NA
回答2:
here is my solution basend on Anandas suggestions (thank you very much for that)
dataframe is "daten" containing many variables, e.g. "IDParticipant", "looporder" and "sm"
first we need to create an object containing the variables for the later use of the melt- and cast-function
idvars <- c("IDParticipant", "looporder")
as it turns out, there were duplicates in the dataframe with the same values in the two variables "IDParticipant" and "looporder", so we need to add another id-varaible to the dataframe when melting it, that is to be done with "getanID" from the splitstackshape-package
melteddata <- melt(getanID(daten, idvars), c(".id", idvars))
after adding an extra id-variable, we can finally cast the dataframe we need using the extra id-variable and the other variables
datenrestrukturiert <- dcast(melteddata, .id + IDParticipant ~ variable + looporder)
来源:https://stackoverflow.com/questions/32244915/dcast-changes-content-of-dataframe