Loop through dataset to calculate diveristy

心不动则不痛 提交于 2021-02-10 20:01:35

问题


I have a dataset like so:

 set.seed(1345)
 df<-data.frame(month= c(rep(1,10), rep(2, 10), rep(3, 10)), 
           species=sample(LETTERS[1:10], 30, replace= TRUE))

I would like to loop through each month and calculate species diversity. I am aware of functions like diversity in library("vegan"), and know solutions to my question using that route (code provided below), but as an exercise for myself with loops I am trying to create a for loop or function that shows the specific calculations for Shannons diversity and Simpsons Diversity so that the calculations for each index are not mysterious. They are calculated using the following formulas:

Thus far I have tried the following for Simpsons:

df <- 
 df %>% 
  group_by(month, species) %>% 
  summarise(freq = n()) 

div<-NA
 for (i in length(unique(df$month))) {
 sum<- sum(df$freq)
 for (i in unique (df$freq)){
 p<- df$freq /sum
 p.sqrd<-p*p
 div[i]<-1/sum(p.sqrd)
   }}

And the following for Shannons:

df <- 
 df %>% 
  group_by(month, species) %>% 
  summarise(freq = n()) 

div<-NA
 for (i in length(unique(df$month))) {
 sum<- sum(df$freq)
 for (i in unique (df$freq)){
 p<- df$freq /sum
 log.p<-ln(p)
 div[i]<- sum(p[i]*ln(p[i]))
   }}

I am not creating a successful loop and would like help indexing this loop correctly and creating one that is most efficient (i.e. incorporating df <- df %>% group_by(month, species) %>% summarise(freq = n()) into the loop) and a for loop that clearly illustrates the equation within the loop.

Using the the diversity function, here are the answers for Simpson's diversity:

library("tidyverse")
df <- 
 df %>% 
 group_by(month, species) %>% 
 summarise(freq = n()) 

# Cast dataframe of interaction frequencies into a matrix
library("reshape2")
ph_mat<- dcast(df,  month~ species)
ph_mat[is.na(ph_mat)] <- 0 #changes 

library("vegan")
df<- data.frame(div=diversity(ph_mat, index="simpson"), 
               month=unique(ph_mat$month))

And for Shannons:

library("vegan")
df<- data.frame(div=diversity(ph_mat, index="shannon"), 
               month=unique(ph_mat$month))

回答1:


I have a solution here that does not incorporate for loops, but where I define and explain a function to calculate each index (no mystery!) It calculates each diversity metric for each month. It uses the group_by() and summarize() functions from dplyr.

set.seed(1345)
df<-data.frame(month= c(rep(1,10), rep(2, 10), rep(3, 10)), 
               species=sample(LETTERS[1:10], 30, replace= TRUE))

calc_shannon <- function(community) {
  p <- table(community)/length(community) # Find proportions
  p <- p[p > 0] # Get rid of zero proportions (log zero is undefined)
  -sum(p * log(p)) # Calculate index
}

calc_simpson <- function(community) {
  p <- table(community)/length(community) # Find proportions
  1 / sum(p^2) # Calculate index
}

diversity_metrics <- 
  df %>% 
  group_by(month) %>% 
  summarize(shannon = calc_shannon(species),
            simpson = calc_simpson(species))


来源:https://stackoverflow.com/questions/53546514/loop-through-dataset-to-calculate-diveristy

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!