How to compare clusters?

前端未结

关注

 5  779

既然无缘 2021-01-18 13:22

Hopefully this can be done with python! I used two clustering programs on the same data and now have a cluster file from both. I reformatted the files so that they look like

5条回答

[愿得一人] (楼主)

2021-01-18 13:33
You have to write some code to parse the file. If you ignore the cluster, you should be able to distinguish between family, genera and species based on indentation.

The easiest way it to define a named tuple:
```
import collections
Bacterium = collections.namedtuple('Bacterium', ['family', 'genera', 'species'])
```
You can make in instance of this object like this:
```
b = Bacterium('Brucellaceae', 'Brucella', 'canis')
```
Your parser should read a file line by line, and set the family and genera. If it then finds a species, it should add a Bacterium to a list;
```
with open('cluster0.txt', 'r') as infile:
    lines = infile.readlines()
family = None
genera = None
bacteria = []
for line in lines:
    # set family and genera.
    # if you detect a bacterium:
    bacteria.append(Bacterium(family, genera, species))
```
Once you have a list of all bacteria in each file or cluster, you can select from all the bacteria like this:
```
s = [b for b in bacteria if b.genera == 'Streptomycetaceae']
```
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...