How to parse a DOT file in Python

无人久伴 提交于 2019-11-30 06:49:13

问题


I have a transducer saved in the form of a DOT file. I can see a graphical representation of the graphs using gvedit, but what if I want to convert the DOT file to an executable transducer, so that I can test the transducer and see what strings it accepts and what it doesn't.

In most of the tools I have seen in Openfst, Graphviz, and their Python extensions, DOT files are only used to create a graphical representation, but what if I want to parse the file to get an interactive program where I can test the strings against the transducer?

Are there any libraries out there that would do the task or should I just write it from scratch?

As I said, the DOT file is related to a transducer I have designed that simulates morphology of English. It is a huge file, but just to give you an idea of how it is like, I provide a sample. Let's say I want to create a transducer that would model the behavior of English with regards to Nouns and in terms of plurality. My lexicon consists of only three words (book, boy, girl). My transducer in this case would look something like this:

which is directly constructed from this DOT file:

digraph A {
rankdir = LR;
node [shape=circle,style=filled] 0
node [shape=circle,style=filled] 1
node [shape=circle,style=filled] 2
node [shape=circle,style=filled] 3
node [shape=circle,style=filled] 4
node [shape=circle,style=filled] 5
node [shape=circle,style=filled] 6
node [shape=circle,style=filled] 7
node [shape=circle,style=filled] 8
node [shape=circle,style=filled] 9
node [shape=doublecircle,style=filled] 10
0 -> 4 [label="g "];
0 -> 1 [label="b "];
1 -> 2 [label="o "];
2 -> 7 [label="y "];
2 -> 3 [label="o "];
3 -> 7 [label="k "];
4 -> 5 [label="i "];
5 -> 6 [label="r "];
6 -> 7 [label="l "];
7 -> 9 [label="<+N:s> "];
7 -> 8 [label="<+N:0> "];
8 -> 10 [label="<+Sg:0> "];
9 -> 10 [label="<+Pl:0> "];
}

Now testing this transducer against the words means that if you feed it with book+Pl it should spit back books and vice versa. I'd like to see how it is possible to turn the dot file into a format that would allow such analysis and testing.


回答1:


You could start by loading the file using https://code.google.com/p/pydot/ . From there it should be relatively simply to write the code to traverse the in-memory graph according to an input string.




回答2:


First of all, I have installed the graphviz library. Then I wrote the following code:

import os
from graphviz import Source
file = open('graph4.dot', 'r')#READING DOT FILE
text=file.read()
Source(text)



回答3:


Guillaume's answer is sufficient to render the graph in Spyder (3.3.2), which might solve some folks problems.

If you really need to manipulate the graph, as the OP needs to, it will be a bit complex. Part of the problem is that Graphviz is a graph rendering library, while you are trying to analyse the graph. What you are trying to do is similar to reverse engineering a Word or LateX document from a PDF file.

If you can assume the nice structure of the OP's example, then regular expressions work. An aphorism I like is that if you solve a problem with regular expressions, now you have two problems. Nonetheless, that might just be the most practical thing to do for these cases.

Here are expressions to capture:

  • your node information: r"node.*?=(\w+).*?\s(\d+)". The capture groups are the kind and the node label.
  • your edge information: r"(\d+).*?(\d+).*?\"(.+?)\s". The capture groups are source, sink, and the edge label.

To try them out easily see https://regex101.com/r/3UKKwV/1/ and https://regex101.com/r/Hgctkp/2/.




回答4:


Use this to load a .dot file in python:

graph = pydot.graph_from_dot_file(apath)

# SHOW as an image
import tempfile, Image
fout = tempfile.NamedTemporaryFile(suffix=".png")
graph.write(fout.name,format="png")
Image.open(fout.name).show()


来源:https://stackoverflow.com/questions/28313901/how-to-parse-a-dot-file-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!