问题
I have a dataset like so:
set.seed(1345)
df<-data.frame(month= c(rep(1,10), rep(2, 10), rep(3, 10)),
species=sample(LETTERS[1:10], 30, replace= TRUE))
I would like to loop through each month and calculate species diversity. I am aware of functions like diversity
in library("vegan")
, and know solutions to my question using that route (code provided below), but as an exercise for myself with loops I am trying to create a for loop
or function that shows the specific calculations for Shannons diversity and Simpsons Diversity so that the calculations for each index are not mysterious. They are calculated using the following formulas:
Thus far I have tried the following for Simpsons:
df <-
df %>%
group_by(month, species) %>%
summarise(freq = n())
div<-NA
for (i in length(unique(df$month))) {
sum<- sum(df$freq)
for (i in unique (df$freq)){
p<- df$freq /sum
p.sqrd<-p*p
div[i]<-1/sum(p.sqrd)
}}
And the following for Shannons:
df <-
df %>%
group_by(month, species) %>%
summarise(freq = n())
div<-NA
for (i in length(unique(df$month))) {
sum<- sum(df$freq)
for (i in unique (df$freq)){
p<- df$freq /sum
log.p<-ln(p)
div[i]<- sum(p[i]*ln(p[i]))
}}
I am not creating a successful loop and would like help indexing this loop correctly and creating one that is most efficient (i.e. incorporating df <- df %>% group_by(month, species) %>% summarise(freq = n())
into the loop) and a for loop that clearly illustrates the equation within the loop.
Using the the diversity
function, here are the answers for Simpson's diversity:
library("tidyverse")
df <-
df %>%
group_by(month, species) %>%
summarise(freq = n())
# Cast dataframe of interaction frequencies into a matrix
library("reshape2")
ph_mat<- dcast(df, month~ species)
ph_mat[is.na(ph_mat)] <- 0 #changes
library("vegan")
df<- data.frame(div=diversity(ph_mat, index="simpson"),
month=unique(ph_mat$month))
And for Shannons:
library("vegan")
df<- data.frame(div=diversity(ph_mat, index="shannon"),
month=unique(ph_mat$month))
回答1:
I have a solution here that does not incorporate for loops, but where I define and explain a function to calculate each index (no mystery!) It calculates each diversity metric for each month. It uses the group_by()
and summarize()
functions from dplyr
.
set.seed(1345)
df<-data.frame(month= c(rep(1,10), rep(2, 10), rep(3, 10)),
species=sample(LETTERS[1:10], 30, replace= TRUE))
calc_shannon <- function(community) {
p <- table(community)/length(community) # Find proportions
p <- p[p > 0] # Get rid of zero proportions (log zero is undefined)
-sum(p * log(p)) # Calculate index
}
calc_simpson <- function(community) {
p <- table(community)/length(community) # Find proportions
1 / sum(p^2) # Calculate index
}
diversity_metrics <-
df %>%
group_by(month) %>%
summarize(shannon = calc_shannon(species),
simpson = calc_simpson(species))
来源:https://stackoverflow.com/questions/53546514/loop-through-dataset-to-calculate-diveristy