How to compare clusters?

前端 未结 5 784
既然无缘
既然无缘 2021-01-18 13:22

Hopefully this can be done with python! I used two clustering programs on the same data and now have a cluster file from both. I reformatted the files so that they look like

5条回答
  •  终归单人心
    2021-01-18 13:36

    Given:

    file1 = '''Cluster 0:
     giant(2)
      red(2)
       brick(1)
       apple(1)
    Cluster 1:
     tiny(3)
      green(1)
       dot(1)
      blue(2)
       flower(1)
       candy(1)'''.split('\n')
    file2 = '''Cluster 18:
     giant(2)
      red(2)
       brick(1)
       tomato(1)
    Cluster 19:
     tiny(2)
      blue(2)
       flower(1)
       candy(1)'''.split('\n')
    

    Is this what you need?

    def parse_file(open_file):
        result = []
    
        for line in open_file:
            indent_level = len(line) - len(line.lstrip())
            if indent_level == 0:
                levels = ['','','']
            item = line.lstrip().split('(', 1)[0]
            levels[indent_level - 1] = item
            if indent_level == 3:
                result.append('.'.join(levels))
        return result
    
    data1 = set(parse_file(file1))
    data2 = set(parse_file(file2))
    
    differences = [
        ('common elements', data1 & data2),
        ('missing from file2', data1 - data2),
        ('missing from file1', data2 - data1) ]
    

    To see the differences:

    for desc, items in differences:
        print desc
        print 
        for item in items:
            print '\t' + item
        print
    

    prints

    common elements
    
        giant.red.brick
        tiny.blue.candy
        tiny.blue.flower
    
    missing from file2
    
        tiny.green.dot
        giant.red.apple
    
    missing from file1
    
        giant.red.tomato
    

提交回复
热议问题