How to compare clusters?

前端未结

关注

 5  784

既然无缘 2021-01-18 13:22

Hopefully this can be done with python! I used two clustering programs on the same data and now have a cluster file from both. I reformatted the files so that they look like

5条回答

终归单人心 (楼主)

2021-01-18 13:36

Given:

file1 = '''Cluster 0:
 giant(2)
  red(2)
   brick(1)
   apple(1)
Cluster 1:
 tiny(3)
  green(1)
   dot(1)
  blue(2)
   flower(1)
   candy(1)'''.split('\n')
file2 = '''Cluster 18:
 giant(2)
  red(2)
   brick(1)
   tomato(1)
Cluster 19:
 tiny(2)
  blue(2)
   flower(1)
   candy(1)'''.split('\n')

Is this what you need?

def parse_file(open_file):
    result = []

    for line in open_file:
        indent_level = len(line) - len(line.lstrip())
        if indent_level == 0:
            levels = ['','','']
        item = line.lstrip().split('(', 1)[0]
        levels[indent_level - 1] = item
        if indent_level == 3:
            result.append('.'.join(levels))
    return result

data1 = set(parse_file(file1))
data2 = set(parse_file(file2))

differences = [
    ('common elements', data1 & data2),
    ('missing from file2', data1 - data2),
    ('missing from file1', data2 - data1) ]

To see the differences:

for desc, items in differences:
    print desc
    print 
    for item in items:
        print '\t' + item
    print

prints

common elements

    giant.red.brick
    tiny.blue.candy
    tiny.blue.flower

missing from file2

    tiny.green.dot
    giant.red.apple

missing from file1

    giant.red.tomato

0 讨论(0)

查看其它5个回答