Loop through dataset to calculate diveristy

问题

I have a dataset like so:

 set.seed(1345)
 df<-data.frame(month= c(rep(1,10), rep(2, 10), rep(3, 10)), 
           species=sample(LETTERS[1:10], 30, replace= TRUE))

I would like to loop through each month and calculate species diversity. I am aware of functions like diversity in library("vegan"), and know solutions to my question using that route (code provided below), but as an exercise for myself with loops I am trying to create a for loop or function that shows the specific calculations for Shannons diversity and Simpsons Diversity so that the calculations for each index are not mysterious. They are calculated using the following formulas:

Thus far I have tried the following for Simpsons:

df <- 
 df %>% 
  group_by(month, species) %>% 
  summarise(freq = n()) 

div<-NA
 for (i in length(unique(df$month))) {
 sum<- sum(df$freq)
 for (i in unique (df$freq)){
 p<- df$freq /sum
 p.sqrd<-p*p
 div[i]<-1/sum(p.sqrd)
   }}

And the following for Shannons:

df <- 
 df %>% 
  group_by(month, species) %>% 
  summarise(freq = n()) 

div<-NA
 for (i in length(unique(df$month))) {
 sum<- sum(df$freq)
 for (i in unique (df$freq)){
 p<- df$freq /sum
 log.p<-ln(p)
 div[i]<- sum(p[i]*ln(p[i]))
   }}

I am not creating a successful loop and would like help indexing this loop correctly and creating one that is most efficient (i.e. incorporating df <- df %>% group_by(month, species) %>% summarise(freq = n()) into the loop) and a for loop that clearly illustrates the equation within the loop.

Using the the diversity function, here are the answers for Simpson's diversity:

library("tidyverse")
df <- 
 df %>% 
 group_by(month, species) %>% 
 summarise(freq = n()) 

# Cast dataframe of interaction frequencies into a matrix
library("reshape2")
ph_mat<- dcast(df,  month~ species)
ph_mat[is.na(ph_mat)] <- 0 #changes 

library("vegan")
df<- data.frame(div=diversity(ph_mat, index="simpson"), 
               month=unique(ph_mat$month))

And for Shannons:

library("vegan")
df<- data.frame(div=diversity(ph_mat, index="shannon"), 
               month=unique(ph_mat$month))

回答1:

I have a solution here that does not incorporate for loops, but where I define and explain a function to calculate each index (no mystery!) It calculates each diversity metric for each month. It uses the group_by() and summarize() functions from dplyr.

set.seed(1345)
df<-data.frame(month= c(rep(1,10), rep(2, 10), rep(3, 10)), 
               species=sample(LETTERS[1:10], 30, replace= TRUE))

calc_shannon <- function(community) {
  p <- table(community)/length(community) # Find proportions
  p <- p[p > 0] # Get rid of zero proportions (log zero is undefined)
  -sum(p * log(p)) # Calculate index
}

calc_simpson <- function(community) {
  p <- table(community)/length(community) # Find proportions
  1 / sum(p^2) # Calculate index
}

diversity_metrics <- 
  df %>% 
  group_by(month) %>% 
  summarize(shannon = calc_shannon(species),
            simpson = calc_simpson(species))

来源：https://stackoverflow.com/questions/53546514/loop-through-dataset-to-calculate-diveristy

标签

function

for-loop