问题
Groups Names COL1 COL2 COL3 COL4
1 G1 SP1 1 0.400 0.500 Sequence1
2 G1 SP1 1 0.004 0.005 Sequence2
3 G1 SP1 0 0.004 0.005 Sequence3
4 G1 SP2 0 0.400 0.005 Sequence123
5 G1 SP2 0 0.004 0.500 Sequence14
6 G1 SP3 0 0.005 0.006 Sequence15
7 G1 SP5 1 0.400 0.006 Sequence16
8 G1 SP6 1 0.008 0.002 Sequence20
10 G2 Sp1 0 0.004 0.005 Sequence17
11 G2 SP1 0 0.050 0.600 Sequence18
12 G2 SP1 0 0.400 0.600 Sequence3
13 G2 SP2 0 0.004 0.005 Sequence22
14 G2 SP2 0 0.004 0.005 Sequence23
15 G2 SP5 0 0.004 0.005 Sequence16
16 G2 SP6 0 0.003 0.002 Sequence21
17 G2 SP7 0 0.560 0.760 Sequence67
Here is the dput
:
dput(test_df)
structure(list(Groups = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("G1", "G2"), class = "factor"),
Names = structure(c(2L, 2L, 2L, 3L, 3L, 4L, 5L, 6L, 1L, 2L,
2L, 3L, 3L, 5L, 6L, 7L), .Label = c("Sp1", "SP1", "SP2",
"SP3", "SP5", "SP6", "SP7"), class = "factor"), COL1 = c(1L,
1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L
), COL2 = c(0.4, 0.004, 0.004, 0.4, 0.004, 0.005, 0.4, 0.008,
0.004, 0.05, 0.4, 0.004, 0.004, 0.004, 0.003, 0.56), COL3 = c(0.5,
0.005, 0.005, 0.005, 0.5, 0.006, 0.006, 0.002, 0.005, 0.6,
0.6, 0.005, 0.005, 0.005, 0.002, 0.76), COL4 = structure(c(1L,
8L, 13L, 2L, 3L, 4L, 5L, 9L, 6L, 7L, 13L, 11L, 12L, 5L, 10L,
14L), .Label = c("Sequence1", "Sequence123", "Sequence14",
"Sequence15", "Sequence16", "Sequence17", "Sequence18", "Sequence2",
"Sequence20", "Sequence21", "Sequence22", "Sequence23", "Sequence3",
"Sequence67"), class = "factor")), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "10", "11", "12", "13", "14",
"15", "16", "17"))
and from this dataf
rame I whant to get another dataframe such as :
G1 G2
SP1 A B
SP2 x x
SP3 x NA
SP4 NA NA
SP5 A X
SP6 a x
SP7 NA b
The idea is for each Groups to add the Names that are present in the row and add letters A,B, X or NA in the cells and the upper or lowercases will depend if we find an identic COL4 value for at least one species in anotger Groups.
- A is put when
any
row for theName
have aCOL1 >0
AND there is at least one Names that have the sameCOL4
content as the same Name in a different Groups - a is put when
any
row for theName
have aCOL1 >0
AND there is no Names that have the sameCOL4
content as the same Name in a different Groups - B is put when
any
row for theName
have aCOL1=0
ANDCOL2
AND
COL3 > 0.05
AND there is at least one Names that have the sameCOL4
content as the same Name in a different Groups - b is put when
any
row for theName
have aCOL1=0
ANDCOL2
AND
COL3 > 0.05
AND there is no Names that have the sameCOL4
content as the same Name in a different Groups - X is put when
all
row for theName
have aCOL1=0
ANDCOL2``OR
COL3 > 0.05
AND
there is at least one Names that have the sameCOL4
content as the same Name in a different Groups - x is put when
all
row for theName
have aCOL1=0
ANDCOL2
OR
COL3 > 0.05
AND
there is no Names that have the sameCOL4
content as the same Name in a different Groups - NA is put when there is not the
Name
in theGroup
let's take 4 examples :
1)
We see for the G1-SP1
that the row1
has a COL1 > 0
, then it will have a letter A
or a
in the new dataframe.
Now in order to know if it will be an A
or an a
we have to look at the COL4
, we see in the row12
the Sequence3
is also present in the G2
for the SP1
, so it will be an 'A'
2)
We see for the G2-SP1
that the row12
has a COL2
and COL3
are > 0.05
, then it will have a letter B
or b
in the new dataframe.
And it will be B
because in the G1
, row3
the Sequence3
is also present in the G2 for the SP1.
3)
We see for the G2-SP2
that none row has a COL1 >0X
or COL2
and COL3
are > 0.05
, then it will have a letter B
or x
in the new dataframe.
And it will be x
because none other SP2
in other Groups
have the same Sequence `(Sequence22,Sequence23 or Sequence24)
4)
We see for the G1-SP6
the row8
has a COL1 > 0
, then it will have a letter A
or a
in the new dataframe.
And it will be a
because none other SP1
in other Groups
have the same Sequence (Sequence20)
`
For that I tried:
Env_table<-as.data.frame(test_df) %>%
group_by(Groups,Names) %>%
mutate(Env_variable = replace_na(COL1, "."),
Env_variable = ifelse(any(COL1 >=1) , "A", Env_variable)) %>%
mutate(Env_variable = ifelse(all(COL1 ==0 ) && all(COL2 >0.05) && all(COL3 >0.05) , "B", Env_variable)) %>%
mutate(Env_variable = ifelse(all(COL1 ==0 ) && all(COL2 <0.05) && all(COL3 <0.05) , "X", Env_variable)) %>%
mutate(Env_variable = ifelse(all(COL1 ==0 ) && all(COL2 <0.05) && all(COL3 >0.05) , "X", Env_variable)) %>%
mutate(Env_variable = ifelse(all(COL1 ==0 ) && all(COL2 >0.05) && all(COL3 <0.05) , "X", Env_variable)) %>%
mutate(Env_variable = ifelse(all(COL1 ==0) && all(!is.na(COL1)) && all(COL2 >0.05) && all(COL3 >0.05) , "*", Env_variable))%>%
slice(1) %>%
pivot_wider(id_col = Names, names_from = Groups, values_from = Env_variable) %>%
arrange(as.integer(str_extract(Names, "\\d+")))
where Env_variable
is juste an empty column that will store the A,B,X or NA values.
Thanks fo your help
回答1:
Your question is not crystal-clear, but here is an attempt to answer:
test_df %>%
group_by(Groups, Names) %>%
summarise(
x=case_when(
any(COL1>=1, na.rm=TRUE) ~ "A",
any(COL1==0 & (COL2>0.05 & COL3>0.05), na.rm=TRUE) ~ "B",
any(COL1==0 & (COL2<0.05 | COL3<0.05), na.rm=TRUE) ~ "X",
TRUE ~ NA_character_
)
) %>%
pivot_wider(names_from = Groups, values_from = x)
This would give this output:
Names G1 G2 <fct> <chr> <chr> 1 SP1 A B 2 SP2 X X 3 SP3 X NA 4 SP5 A X 5 SP6 A X 6 SP1 NA X 7 SP7 NA B
来源:https://stackoverflow.com/questions/61595711/using-dplyr-to-create-new-dataframe-depending-on-thresholds