问题
I'd like to know which observations are present (>=1) in all samples (columns) and which are unique to each subset of samples ("Continent" or "Country").
For example: -df_all = would containin Obs5 (>=1 in all samples)
- df_Europe = would contain Obs1 (>=1 in Europe and =0 in Africa)
- df_Italy = would contain Obs2 (>=1 in Italy and 0 in the rest)
- etc...
For the first one can use:
row_sub = apply(df, 1, function(row) all(row !=0 ))
dff <- df[row_sub,]
BUT is there a way of coding it as a loop, using dplyr, reshape, or any other to avoid to do it manually for all of them?
df <- data.frame(column=c("Continent", "Country", "Obs1", "Obs2", "Obs3", "Obs4", "Obs5"), `1`=c("Europe", "Italy", 1,2,2,0,1), `2`=c("Europe", "Portugal", 2,0,0,0,2), `3`=c("Africa", "Nigeria", 0,0,1,2,3), `4`=c("Africa", "Nigeria", 0,0,1,3,4),check.names=FALSE)
df
column 1 2 3 4
1 Continent Europe Europe Africa Africa
2 Country Italy Portugal Nigeria Nigeria
3 Obs1 1 2 0 0
4 Obs2 2 0 0 0
5 Obs3 2 0 1 1
6 Obs4 0 0 2 3
7 Obs5 1 2 3 4
I have seen something similar here Subsetting data by levels of granularity and applying a function to each data frame in R but I do not think is the exact same output what we want.
This is the same as long_table if it helps
library(tidyverse)
df_long <- tibble::tibble(
ID = 1:20, #use this to sort
Sample = rep(1:4,5),
Continent = rep(c(rep("Europe",2),rep("Africa",2)),5),
Country = rep(c("Italy", "Portugal", rep("Nigeria",2)),5),
value = c(1, 2,0,0,2,0,0,0,2,0,1,1,0,0,2,3,1,2,3,4))
data_Europe <- df_long %>%
count(Continent, Sample, wt = value) %>%
filter(n > 0) %>%
count(Europe) %% slect...?
来源:https://stackoverflow.com/questions/64806211/subset-data-based-on-presence-absence-on-unique-samples-and-sample-groups-in-r