How to compare clusters?

前端 未结 5 779
既然无缘
既然无缘 2021-01-18 13:22

Hopefully this can be done with python! I used two clustering programs on the same data and now have a cluster file from both. I reformatted the files so that they look like

5条回答
  •  [愿得一人]
    2021-01-18 13:33

    You have to write some code to parse the file. If you ignore the cluster, you should be able to distinguish between family, genera and species based on indentation.

    The easiest way it to define a named tuple:

    import collections
    Bacterium = collections.namedtuple('Bacterium', ['family', 'genera', 'species'])
    

    You can make in instance of this object like this:

    b = Bacterium('Brucellaceae', 'Brucella', 'canis')
    

    Your parser should read a file line by line, and set the family and genera. If it then finds a species, it should add a Bacterium to a list;

    with open('cluster0.txt', 'r') as infile:
        lines = infile.readlines()
    family = None
    genera = None
    bacteria = []
    for line in lines:
        # set family and genera.
        # if you detect a bacterium:
        bacteria.append(Bacterium(family, genera, species))
    

    Once you have a list of all bacteria in each file or cluster, you can select from all the bacteria like this:

    s = [b for b in bacteria if b.genera == 'Streptomycetaceae']
    

提交回复
热议问题