I have the main data frame that has 415526 entries (rows), which represents the list of proteins in 51 microbial species. As each protein has a different number of protein d