How to serialize a graph structure?

前端 未结 6 908
一整个雨季
一整个雨季 2021-02-02 08:11

Flat files and relational databases give us a mechanism to serialize structured data. XML is superb for serializing un-structured tree-like data.

But many problems are b

相关标签:
6条回答
  • 2021-02-02 08:55

    How do you represent your graph in memory?
    Basically you have two (good) options:

    • an adjacency list representation
    • an adjacency matrix representation

    in which the adjacency list representation is best used for a sparse graph, and a matrix representation for the dense graphs.

    If you used suchs representations then you could serialize those representations instead.

    If it has to be human readable you could still opt for creating your own serialization algorithm. For example you could write down the matrix representation like you would do with any "normal" matrix: just print out the columns and rows, and all the data in it like so:

       1  2  3
    1 #t #f #f
    2 #f #f #t
    3 #f #t #f
    

    (this is a non-optimized, non weighted representation, but can be used for directed graphs)

    0 讨论(0)
  • 2021-02-02 08:55

    One example you might be familiar is Java serialization. This effectively serializes by graph, with each object instance being a node, and each reference being an edge. The algorithm used is recursive, but skipping duplicates. So the pseudo code would be:

    serialize(x):
        done - a set of serialized objects
        if(serialized(x, done)) then return
        otherwise:
             record properties of x
             record x as serialized in done
             for each neighbour/child of x: serialize(child)
    

    Another way of course is as a list of nodes and edges, which can be done as XML, or in any other preferred serialization format, or as an adjacency matrix.

    0 讨论(0)
  • 2021-02-02 09:01

    XML is very verbose. Whenever I do it, I roll my own. Here's an example of a 3 node directed acyclic graph. It's pretty compact and does everything I need it to do:

    0: foo
    1: bar
    2: bat
    ----
    0 1
    0 2
    1 2
    
    0 讨论(0)
  • 2021-02-02 09:05

    On a less academic, more practical note, in CubicTest we use Xstream (Java) to serialize tests to and from xml. Xstream handles graph-structured object relations, so you might learn a thing or two from looking at it's source and the resulting xml. You're right about the ugly part though, the generated xml files don't look pretty.

    0 讨论(0)
  • 2021-02-02 09:11

    Typically relationships in XML are shown by the parent/child relationship. XML can handle graph data but not in this manner. To handle graphs in XML you should use the xs:ID and xs:IDREF schema types.

    In an example, assume that node/@id is an xs:ID type and that link/@ref is an xs:IDREF type. The following XML shows the cycle of three nodes 1 -> 2 -> 3 -> 1.

    <data>
      <node id="1"> 
        <link ref="2"/>
      </node>
      <node id="2">
        <link ref="3"/>
      </node>
      <node id="3">
        <link ref="1"/>
      </node>
    </data>
    

    Many development tools have support for ID and IDREF too. I have used Java's JAXB (Java XML Binding. It supports these through the @XmlID and the @XmlIDREF annotations. You can build your graph using plain Java objects and then use JAXB to handle the actual serialization to XML.

    0 讨论(0)
  • 2021-02-02 09:13

    Adjacency lists and adjacency matrices are the two common ways of representing graphs in memory. The first decision you need to make when deciding between these two is what you want to optimize for. Adjacency lists are very fast if you need to, for example, get the list of a vertex's neighbors. On the other hand, if you are doing a lot of testing for edge existence or have a graph representation of a markov chain, then you'd probably favor an adjacency matrix.

    The next question you need to consider is how much you need to fit into memory. In most cases, where the number of edges in the graph is much much smaller than the total number of possible edges, an adjacency list is going to be more efficient, since you only need to store the edges that actually exist. A happy medium is to represent the adjacency matrix in compressed sparse row format in which you keep a vector of the non-zero entries from top left to bottom right, a corresponding vector indicating which columns the non-zero entries can be found in, and a third vector indicating the start of each row in the column-entry vector.

    [[0.0, 0.0, 0.3, 0.1]
     [0.1, 0.0, 0.0, 0.0]
     [0.0, 0.0, 0.0, 0.0]
     [0.5, 0.2, 0.0, 0.3]]
    

    can be represented as:

    vals: [0.3, 0.1, 0.1, 0.5, 0.2, 0.3]
    cols: [2,   3,   0,   0,   1,   4]
    rows: [0,        2, null,  4]
    

    Compressed sparse row is effectively an adjacency list (the column indices function the same way), but the format lends itself a bit more cleanly to matrix operations.

    0 讨论(0)
提交回复
热议问题