Subset data based on presence/absence on unique samples and sample groups in R

你说的曾经没有我的故事 提交于 2021-01-26 02:07:30

问题


I'd like to know which observations are present (>=1) in all samples (columns) and which are unique to each subset of samples ("Continent" or "Country").

For example: -df_all = would containin Obs5 (>=1 in all samples)

  • df_Europe = would contain Obs1 (>=1 in Europe and =0 in Africa)
  • df_Italy = would contain Obs2 (>=1 in Italy and 0 in the rest)
  • etc...

For the first one can use:

row_sub = apply(df, 1, function(row) all(row !=0 ))
dff <- df[row_sub,]

BUT is there a way of coding it as a loop, using dplyr, reshape, or any other to avoid to do it manually for all of them?

df <- data.frame(column=c("Continent", "Country", "Obs1", "Obs2", "Obs3", "Obs4", "Obs5"), `1`=c("Europe", "Italy", 1,2,2,0,1), `2`=c("Europe", "Portugal", 2,0,0,0,2), `3`=c("Africa", "Nigeria", 0,0,1,2,3), `4`=c("Africa", "Nigeria", 0,0,1,3,4),check.names=FALSE)

df
  column    1       2        3       4 
1 Continent Europe  Europe   Africa  Africa
2 Country   Italy   Portugal Nigeria Nigeria
3 Obs1      1       2        0       0
4 Obs2      2       0        0       0
5 Obs3      2       0        1       1
6 Obs4      0       0        2       3
7 Obs5      1       2        3       4

I have seen something similar here Subsetting data by levels of granularity and applying a function to each data frame in R but I do not think is the exact same output what we want.

This is the same as long_table if it helps

library(tidyverse)
df_long <- tibble::tibble(
  ID = 1:20, #use this to sort
  Sample = rep(1:4,5),
  Continent = rep(c(rep("Europe",2),rep("Africa",2)),5),
  Country = rep(c("Italy", "Portugal", rep("Nigeria",2)),5),
  value = c(1, 2,0,0,2,0,0,0,2,0,1,1,0,0,2,3,1,2,3,4))


data_Europe <- df_long %>%
  count(Continent, Sample, wt = value) %>%
  filter(n > 0) %>%
  count(Europe) %% slect...?

来源:https://stackoverflow.com/questions/64806211/subset-data-based-on-presence-absence-on-unique-samples-and-sample-groups-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!