PostgreSQL to XML with 3 Tables

前端 未结 1 1437
被撕碎了的回忆
被撕碎了的回忆 2021-01-22 05:04

I\'m a trainee in a small dev team and my project leader wants me to write a function that will export PostgreSQL data to a XML file. Unfortunately, I only know how to write an

1条回答
  •  猫巷女王i
    2021-01-22 05:53

    You have three levels of nested tables.

    Sample data:

    CREATE TABLE a(
      a_id integer primary key,
      name text
    );
    
    CREATE TABLE b(
      b_id integer primary key,
      a_id integer references a(a_id),
      val text
    );
    
    CREATE TABLE c(
      c_id serial primary key,
      b_id integer references b(b_id),
      blah text
    );
    
    INSERT INTO a(a_id, name) VALUES (1, 'fred'),(2, 'bert');
    
    INSERT INTO b(b_id, a_id, val) VALUES 
    (11, 1, 'x'), (12, 1, 'y'), (21, 2, 'a'), (22, 2, 'b');
    
    INSERT INTO c(b_id, blah) VALUES
    (11, 'whatever'), (11, 'gah'), (12, 'borkbork'), (22, 'fuzz');
    

    Method 1: Do a left join, handle XML in the client

    The simplest way to handle this is to do a left join over all three tables, ordered from outermost to innermost. Then you iterate down the result set, closing one element and opening another whenever the subject at that level changes.

    select *
    from a left join b on (a.a_id = b.a_id)
           left join c on (b.b_id = c.b_id)
    order by a.a_id, b.b_id, c.c_id;
    

    then loop over the rows returned, and for each row, pseudocode:

    cur_row = get_new_row()
    
    if (cur_row[b_id] != prev_row[b_id]) {
       emit_close_tableb();
    }
    if (cur_row[a_id] != prev_row[a_id]) {
       emit_close_tablea();
       emit_open_tablea(cur_row);
    }
    if (cur_row[b_id] != prev_row[b_id]) {
       emit_open_tableb(cur_row);
    }
    emit_tablec(cur_row);
    
    prev_row = cur_row;
    

    To write the XML you'd use something like XMLWriter. To read the query data you can use something like PDO or whatever driver you prefer. If the data set is large consider using a cursor to read the data.

    This works well, but it transfers a lot of excess data, since you transfer n copies of the outer table's data for each n rows of the inner table associated with it.


    To reduce the excess data exchanged you can select only the IDs for the outer tables

    select a.a_id, b.b_id, c.*
    from a left join b on (a.a_id = b.a_id)
           left join c on (b.b_id = c.b_id)
    order by a.a_id, b.b_id, c.c_id;
    

    ... then when you switch to a new tablea / tableb, SELECT the rest of its rows then. You'll probably use a second connection to do this so you don't distrupt the result set and cursor state on the main connection you're reading rows from.

    Method 2: Do it all in PostgreSQL

    For smaller data sets, or for the inner levels of bigger data sets, you can use PostgreSQL's XML support to construct the XML documents, e.g.:

    WITH xmlinput AS (
      SELECT a, b, c
      FROM a
      LEFT JOIN b ON (a.a_id = b.a_id)
      LEFT JOIN c on (b.b_id = c.b_id)
      ORDER BY a.a_id, b.b_id, c.c_id
    )
    SELECT
      XMLELEMENT(name items,
        xmlagg(
          XMLELEMENT(name a,
            XMLFOREST((a).a_id AS a_id, (a)."name" AS name),
            b_xml
          )
        ORDER BY (a).a_id)
      ) AS output
    FROM
    (
      SELECT
        a,
        xmlagg(
          XMLELEMENT(name b,
            XMLFOREST((b).b_id AS b_id, (b).val AS val),
            c_xml
          )
        ORDER BY (b).b_id)
        AS b_xml
      FROM
      (
        SELECT
          a, b,
          xmlagg(
            XMLELEMENT(name c,
              XMLFOREST((c).c_id AS c_id, (c).blah AS blah)
            )
          ORDER BY (c).c_id)
          AS c_xml
        FROM xmlinput
        GROUP BY a, b
      ) c_as_xml
      GROUP BY a
    ) b_as_xml;
    

    ... but really, you've got to be some kind of masochist to write code like that. Though it could prove to be fairly fast.

    To understand the query you'll need to read the PostgreSQL XML docs. The wackly syntax was dreamed up by the SQL/XML committee, don't blame us.

    Also note that row-variables are used heavily in the above code to keep it organized. a, b and c are passed as whole rows to outer layers of the query. This avoids the need to mess with aliases when names collide. The syntax (a).a_id, etc, means "the a_id field of the row-variable a". See the PostgreSQL manual for details.

    The above uses a better XML structure (see comments below). If you want to emit attributes not elements, you can change the XMLFOREST calls to XMLATTRIBUTES calls.

    Output:

    1fred11x1whatever2gah12y3borkbork2bert21a22b4fuzz
    

    or, pretty-printed:

    
    
        
            1
            fred
            
                11
                x
                
                    1
                    whatever
                
                
                    2
                    gah
                
            
            
                12
                y
                
                    3
                    borkbork
                
            
        
        
            2
            bert
            
                21
                a
                
            
            
                22
                b
                
                    4
                    fuzz
                
            
        
    
    

    Please emit better XML

    On a side note, using attributes like that in XML seems tempting, but it quickly gets difficult and ugly to work with. Please just use normal XML elements:

      1blah
    1
    42......... ...

    0 讨论(0)
提交回复
热议问题