I have the following GraphML file 'mygraph.gml' that I want to parse with a simple python script:
This represents a simple graph with 2 nodes "node0", "node1" and an edge between them
<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns"
<key id="name" for="node" attr.name="name" attr.type="string"/>
<key id="weight" for="edge" attr.name="weight" attr.type="double"/>
<graph id="G" edgedefault="directed">
<node id="n0">
<data key="name">node1</data>
<node id="n1">
<data key="name">node2</data>
<edge source="n1" target="n0">
<data key="weight">1</data>
This represents a graph with two nodes n0 and n1 with an edge of weight 1 between them. I want to parse this structure with python.
I wrote a script with the help of lxml (I need to use it because the dataset in much much bigger than this simple example, more than 10^5 nodes, python minidom is too slow)
import lxml.etree as et
tree = et.parse('mygraph.gml')
root = tree.getroot()
graphml = {
"graph": "{http://graphml.graphdrawing.org/xmlns}graph",
"node": "{http://graphml.graphdrawing.org/xmlns}node",
"edge": "{http://graphml.graphdrawing.org/xmlns}edge",
"data": "{http://graphml.graphdrawing.org/xmlns}data",
"label": "{http://graphml.graphdrawing.org/xmlns}data[@key='label']",
"x": "{http://graphml.graphdrawing.org/xmlns}data[@key='x']",
"y": "{http://graphml.graphdrawing.org/xmlns}data[@key='y']",
"size": "{http://graphml.graphdrawing.org/xmlns}data[@key='size']",
"r": "{http://graphml.graphdrawing.org/xmlns}data[@key='r']",
"g": "{http://graphml.graphdrawing.org/xmlns}data[@key='g']",
"b": "{http://graphml.graphdrawing.org/xmlns}data[@key='b']",
"weight": "{http://graphml.graphdrawing.org/xmlns}data[@key='weight']",
"edgeid": "{http://graphml.graphdrawing.org/xmlns}data[@key='edgeid']"
graph = tree.find(graphml.get("graph"))
nodes = graph.findall(graphml.get("node"))
edges = graph.findall(graphml.get("edge"))
This script gets correctly the nodes and edges so that I can simply iterate over them
for n in nodes:
print n.attrib
or similarly on edges:
for e in edges:
print (e.attrib['source'], e.attrib['target'])
but I can't really understand how to get the "data" tag for the edges or the nodes in order to print the edge weight and nodes tag "name".
This doesn't work for me:
weights = graph.findall(graphml.get("weight"))
the last list is always empty. Why? I'm missing something around but can't understand what.
You can't do it in one pass, but for each node found, you can build a dict with the key/value of data:
graph = tree.find(graphml.get("graph"))
nodes = graph.findall(graphml.get("node"))
edges = graph.findall(graphml.get("edge"))
for node in nodes + edges:
attribs = {}
for data in node.findall(graphml.get('data')):
attribs[data.get('key')] = data.text
print 'Node', node, 'have', attribs
It give the result:
Node <Element {http://graphml.graphdrawing.org/xmlns}node at 0x7ff053d3e5a0> have {'name': 'node1'}
Node <Element {http://graphml.graphdrawing.org/xmlns}node at 0x7ff053d3e5f0> have {'name': 'node2'}
Node <Element {http://graphml.graphdrawing.org/xmlns}edge at 0x7ff053d3e640> have {'weight': '1'}