Finding cycle of 3 nodes ( or triangles) in a graph

后端 未结 11 2067
南旧
南旧 2020-12-31 15:42

I am working with complex networks. I want to find group of nodes which forms a cycle of 3 nodes (or triangles) in a given graph. As my graph contains about million edges, u

相关标签:
11条回答
  • 2020-12-31 16:34

    Pretty easy and clear way to do is to use Networkx:

    With Networkx you can get the loops of an undirected graph by nx.cycle_basis(G) and then select the ones with 3 nodes

    cycls_3 = [c for c in nx.cycle_basis(G) if len(c)==3]
    

    or you can find all the cliques by find_cliques(G) and then select the ones you want (with 3 nodes). cliques are sections of the graph where all the nodes are connected to each other which happens in cycles/loops with 3 nodes.

    0 讨论(0)
  • 2020-12-31 16:35

    I am working on the same problem of counting number of triangles on undirected graph and wisty's solution works really well in my case. I have modified it a bit so only undirected triangles are counted.

        #### function for counting undirected cycles
        def generate_triangles(nodes):
            visited_ids = set() # mark visited node
            for node_a_id in nodes:
                temp_visited = set() # to get undirected triangles
                for node_b_id in nodes[node_a_id]:
                    if node_b_id == node_a_id:
                        raise ValueError # to prevent self-loops, if your graph allows self-loops then you don't need this condition
                    if node_b_id in visited_ids:
                        continue
                    for node_c_id in nodes[node_b_id]:
                        if node_c_id in visited_ids:
                            continue    
                        if node_c_id in temp_visited:
                            continue
                        if node_a_id in nodes[node_c_id]:
                            yield(node_a_id, node_b_id, node_c_id)
                        else:
                            continue
                    temp_visited.add(node_b_id)
                visited_ids.add(node_a_id)
    

    Of course, you need to use a dictionary for example

        #### Test cycles ####
    
        nodes = {}
    
        nodes[0] = [1, 2, 3]
        nodes[1] = [0, 2]
        nodes[2] = [0, 1, 3]
        nodes[3] = [1]
    
        cycles = list(generate_triangles(nodes))
        print cycles
    

    Using the code of Wisty, the triangles found will be [(0, 1, 2), (0, 2, 1), (0, 3, 1), (1, 2, 3)]

    which counted the triangle (0, 1, 2) and (0, 2, 1) as two different triangles. With the code I modified, these are counted as only one triangle.

    I used this with a relatively small dictionary of under 100 keys and each key has on average 50 values.

    0 讨论(0)
  • 2020-12-31 16:37

    Even though it isn't efficient, you may want to implement a solution, so use the loops. Write a test so you can get an idea as to how long it takes.

    Then, as you try new approaches you can do two things: 1) Make certain that the answer remains the same. 2) See what the improvement is.

    Having a faster algorithm that misses something is probably going to be worse than having a slower one.

    Once you have the slow test, you can see if you can do this in parallel and see what the performance increase is.

    Then, you can see if you can mark all nodes that have less than 3 vertices.

    Ideally, you may want to shrink it down to just 100 or so first, so you can draw it, and see what is happening graphically.

    Sometimes your brain will see a pattern that isn't as obvious when looking at algorithms.

    0 讨论(0)
  • 2020-12-31 16:45

    A million edges is quite small. Unless you are doing it thousands of times, just use a naive implementation.

    I'll assume that you have a dictionary of node_ids, which point to a sequence of their neighbors, and that the graph is directed.

    For example:

    nodes = {}
    nodes[0] = 1,2
    nodes[1] = tuple() # empty tuple
    nodes[2] = 1
    

    My solution:

    def generate_triangles(nodes):
        """Generate triangles. Weed out duplicates."""
        visited_ids = set() # remember the nodes that we have tested already
        for node_a_id in nodes:
            for node_b_id in nodes[node_a_id]:
                if nod_b_id == node_a_id:
                    raise ValueError # nodes shouldn't point to themselves
                if node_b_id in visited_ids:
                    continue # we should have already found b->a->??->b
                for node_c_id in nodes[node_b_id]:
                    if node_c_id in visited_ids:
                        continue # we should have already found c->a->b->c
                    if node_a_id in nodes[node_c_id]:
                        yield(node_a_id, node_b_id, node_c_id)
            visited_ids.add(node_a_id) # don't search a - we already have all those cycles
    

    Checking performance:

    from random import randint
    n = 1000000
    node_list = range(n)
    nodes = {}
    for node_id in node_list:
        node = tuple()
        for i in range(randint(0,10)): # add up to 10 neighbors
            try:
                neighbor_id = node_list[node_id+randint(-5,5)] # pick a nearby node
            except:
                continue 
            if not neighbor_id in node:
                node = node + (neighbor_id,)
        nodes[node_id] = node
    
    cycles = list(generate_triangles(nodes))
    print len(cycles)
    

    When I tried it, it took longer to build the random graph than to count the cycles.

    You might want to test it though ;) I won't guarantee that it's correct.

    You could also look into networkx, which is the big python graph library.

    0 讨论(0)
  • 2020-12-31 16:47

    Do you need to find 'all' of the 'triangles', or just 'some'/'any'? Or perhaps you just need to test whether a particular node is part of a triangle?

    The test is simple - given a node A, are there any two connected nodes B & C that are also directly connected.

    If you need to find all of the triangles - specifically, all groups of 3 nodes in which each node is joined to the other two - then you need to check every possible group in a very long running 'for each' loop.

    The only optimisation is ensuring that you don't check the same 'group' twice, e.g. if you have already tested that B & C aren't in a group with A, then don't check whether A & C are in a group with B.

    0 讨论(0)
提交回复
热议问题