问题
I'm learning about graphs(they seem super useful) and was wondering if I could get some advice on a possible way to structure my graphs.
Simply, Lets say I get purchase order data everyday and some days its the same as the day before and on others its different. For example, yesterday I had an order of pencils and erasers, I create the two nodes to represent them and then today I get an order for an eraser and a marker, and so on. After each day, my program also looks to see who ordered what, and if Bob ordered a pencil yesterday and then an eraser today, it creates a directed edge. My logic for this is I can see who bought what on each day and I can track the purchase behaviour of Bob(and maybe use it to infer patterns with himself or other users).
My problem is, I'm using networkx(python) and creating a node 'pencil' for yesterday and then another node 'pencil' for day2 and I can't differentiate them.
I thought(and have been) naming it day2-pencil and then scanning the entire graph and stripping out the 'day2-' to track pencil orders. This seems wrong to me(not to mention expensive on the processor). I think the key would be if I can somehow mark each day as its own subgraph so when I want to study a specific day or a few days, I don't have to scan the entire graph.
As my test data gets larger, its getting more and more confusing so I am wondering what the best practice is? Any generate suggestions would be great(as networkx seems pretty full featured so they probably have a way of doing it).
Thanks in advance!
Update: Still no luck, but this maybe helpful:
import networkx as nx
G=nx.Graph()
G.add_node('pencil', day='1/1/12', colour='blue')
G.add_node('eraser', day='1/1/12', colour='rubberish colour. I know thats not a real colour')
G.add_node('pencil', day='1/2/12', colour='blue')
The result I get typing the following command G.node
is:
{'pencil': {'colour': 'blue', 'day': '1/2/12'}, 'eraser': {'colour': 'rubberish colour. I know thats not a real colour', 'day': '1/1/12'}}
Its obviously overwriting the pencil from 1/1/12 with 1/2/12 one, not sure if I can make a distint one.
回答1:
This is mostly depending on your goal actually. What you want to analyze is the definitive factor in your graph design. But, looking at your structure, a general structure would be nodes for Customers
and Products
, that are connected by Days
(I don't know if this would help you any better but this is in fact a bipartite graph).
So your structure would be something like this:
node(Person) --- edge(Day) ---> node(Product)
Let's say, Bob buys a pencil on 1/1/12:
node(Bob) --- 1/1/12 ---> node(Pencil)
Ok, now Bob goes and buys another pencil on 1/2/12:
-- 1/1/12 --
/ \
node(Bob) > node(Pencil)
\ /
-- 1/2/12 --
so on...
This is actually possible with networkx
. Since you have multiple edges between nodes, you have to choose between MultiGraph
Mor MultiDiGraph
depending on the directed-ness of your edges.
In : g = networkx.MultiDiGraph()
In : g.add_node("Bob")
In : g.add_node("Alice")
In : g.add_node("Pencil")
In : g.add_edge("Bob","Pencil",key="1/1/12")
In : g.add_edge("Bob","Pencil",key="1/2/12")
In : g.add_edge("Alice","Pencil",key="1/3/12")
In : g.add_edge("Alice","Pencil",key="1/2/12")
In : g.edges(keys=True)
Out:
[('Bob', 'Pencil', '1/2/12'),
('Bob', 'Pencil', '1/1/12'),
('Alice', 'Pencil', '1/3/12'),
('Alice', 'Pencil', '1/2/12')]
so far, not bad. You can actually query things like "Did Alice buy a Pencil on 1/1/12?".
In : g.has_edge("Alice","Pencil","1/1/12")
Out: False
In : g.has_edge("Alice","Pencil","1/2/12")
Out: True
Things might get bad if you want all orders on specific days. By bad, I don't mean code-wise, but computation-wise. Code-wise it is rather simple:
In : [(from_node, to_node) for from_node, to_node, key in g.edges(keys=True) if key=="1/2/12"]
Out: [('Bob', 'Pencil'), ('Alice', 'Pencil')]
But this scans all the edges in the network and filters the ones you want. I don't think networkx
has any better way.
回答2:
Graphs are not the best approach for this. A relational database such as MySQL is the right tool for storing this data and performing such queries as who bought what when.
回答3:
Try this:
Give each node a unique integer ID. Then, create a dictionary, nodes, such that:
nodes['pencil'] = [1,4,...] <- where all of these correspond to a node with the pencil attribute. Replace 'pencil' with whatever other attributes you're interested in.
Just make sure that when you add a node with 'pencil', you update the dictionary:
node['pencil'].append(new_node_id). Likewise with node deletion.
来源:https://stackoverflow.com/questions/8828349/what-is-the-correct-graph-data-structure-to-differentiate-between-nodes-with-the