What is the correct graph data structure to differentiate between nodes with the same name?

问题

I'm learning about graphs(they seem super useful) and was wondering if I could get some advice on a possible way to structure my graphs.

Simply, Lets say I get purchase order data everyday and some days its the same as the day before and on others its different. For example, yesterday I had an order of pencils and erasers, I create the two nodes to represent them and then today I get an order for an eraser and a marker, and so on. After each day, my program also looks to see who ordered what, and if Bob ordered a pencil yesterday and then an eraser today, it creates a directed edge. My logic for this is I can see who bought what on each day and I can track the purchase behaviour of Bob(and maybe use it to infer patterns with himself or other users).

My problem is, I'm using networkx(python) and creating a node 'pencil' for yesterday and then another node 'pencil' for day2 and I can't differentiate them.

I thought(and have been) naming it day2-pencil and then scanning the entire graph and stripping out the 'day2-' to track pencil orders. This seems wrong to me(not to mention expensive on the processor). I think the key would be if I can somehow mark each day as its own subgraph so when I want to study a specific day or a few days, I don't have to scan the entire graph.

As my test data gets larger, its getting more and more confusing so I am wondering what the best practice is? Any generate suggestions would be great(as networkx seems pretty full featured so they probably have a way of doing it).

Thanks in advance!

Update: Still no luck, but this maybe helpful:

import networkx as nx
G=nx.Graph()
G.add_node('pencil', day='1/1/12', colour='blue')
G.add_node('eraser', day='1/1/12', colour='rubberish colour. I know thats not a real colour')
G.add_node('pencil', day='1/2/12', colour='blue')

The result I get typing the following command G.node is:

{'pencil': {'colour': 'blue', 'day': '1/2/12'}, 'eraser': {'colour': 'rubberish colour. I know thats not a real colour', 'day': '1/1/12'}}

Its obviously overwriting the pencil from 1/1/12 with 1/2/12 one, not sure if I can make a distint one.

回答1:

This is mostly depending on your goal actually. What you want to analyze is the definitive factor in your graph design. But, looking at your structure, a general structure would be nodes for Customers and Products, that are connected by Days (I don't know if this would help you any better but this is in fact a bipartite graph).

So your structure would be something like this:

node(Person) --- edge(Day) ---> node(Product)

Let's say, Bob buys a pencil on 1/1/12:

node(Bob) --- 1/1/12 ---> node(Pencil)

Ok, now Bob goes and buys another pencil on 1/2/12:

          -- 1/1/12 --
         /            \
node(Bob)              > node(Pencil)
         \            /
          -- 1/2/12 --

so on...

This is actually possible with networkx. Since you have multiple edges between nodes, you have to choose between MultiGraphMor MultiDiGraph depending on the directed-ness of your edges.

In : g = networkx.MultiDiGraph()

In : g.add_node("Bob")
In : g.add_node("Alice")

In : g.add_node("Pencil")

In : g.add_edge("Bob","Pencil",key="1/1/12")
In : g.add_edge("Bob","Pencil",key="1/2/12")

In : g.add_edge("Alice","Pencil",key="1/3/12")
In : g.add_edge("Alice","Pencil",key="1/2/12")

In : g.edges(keys=True)
Out:
[('Bob', 'Pencil', '1/2/12'),
 ('Bob', 'Pencil', '1/1/12'),
 ('Alice', 'Pencil', '1/3/12'),
 ('Alice', 'Pencil', '1/2/12')]

so far, not bad. You can actually query things like "Did Alice buy a Pencil on 1/1/12?".

In : g.has_edge("Alice","Pencil","1/1/12")
Out: False

In : g.has_edge("Alice","Pencil","1/2/12")
Out: True

Things might get bad if you want all orders on specific days. By bad, I don't mean code-wise, but computation-wise. Code-wise it is rather simple:

In : [(from_node, to_node) for from_node, to_node, key in g.edges(keys=True) if key=="1/2/12"]
Out: [('Bob', 'Pencil'), ('Alice', 'Pencil')]

But this scans all the edges in the network and filters the ones you want. I don't think networkx has any better way.

回答2:

Graphs are not the best approach for this. A relational database such as MySQL is the right tool for storing this data and performing such queries as who bought what when.

回答3:

Try this:

Give each node a unique integer ID. Then, create a dictionary, nodes, such that:

nodes['pencil'] = [1,4,...] <- where all of these correspond to a node with the pencil attribute. Replace 'pencil' with whatever other attributes you're interested in.

Just make sure that when you add a node with 'pencil', you update the dictionary:

node['pencil'].append(new_node_id). Likewise with node deletion.

来源：https://stackoverflow.com/questions/8828349/what-is-the-correct-graph-data-structure-to-differentiate-between-nodes-with-the

标签

python

algorithm

graph-theory

networkx