I\'m a trainee in a small dev team and my project leader wants me to write a function that will export PostgreSQL data to a XML file. Unfortunately, I only know how to write an
You have three levels of nested tables.
CREATE TABLE a(
a_id integer primary key,
name text
);
CREATE TABLE b(
b_id integer primary key,
a_id integer references a(a_id),
val text
);
CREATE TABLE c(
c_id serial primary key,
b_id integer references b(b_id),
blah text
);
INSERT INTO a(a_id, name) VALUES (1, 'fred'),(2, 'bert');
INSERT INTO b(b_id, a_id, val) VALUES
(11, 1, 'x'), (12, 1, 'y'), (21, 2, 'a'), (22, 2, 'b');
INSERT INTO c(b_id, blah) VALUES
(11, 'whatever'), (11, 'gah'), (12, 'borkbork'), (22, 'fuzz');
The simplest way to handle this is to do a left join over all three tables, ordered from outermost to innermost. Then you iterate down the result set, closing one element and opening another whenever the subject at that level changes.
select *
from a left join b on (a.a_id = b.a_id)
left join c on (b.b_id = c.b_id)
order by a.a_id, b.b_id, c.c_id;
then loop over the rows returned, and for each row, pseudocode:
cur_row = get_new_row()
if (cur_row[b_id] != prev_row[b_id]) {
emit_close_tableb();
}
if (cur_row[a_id] != prev_row[a_id]) {
emit_close_tablea();
emit_open_tablea(cur_row);
}
if (cur_row[b_id] != prev_row[b_id]) {
emit_open_tableb(cur_row);
}
emit_tablec(cur_row);
prev_row = cur_row;
To write the XML you'd use something like XMLWriter. To read the query data you can use something like PDO or whatever driver you prefer. If the data set is large consider using a cursor to read the data.
This works well, but it transfers a lot of excess data, since you transfer n
copies of the outer table's data for each n
rows of the inner table associated with it.
To reduce the excess data exchanged you can select only the IDs for the outer tables
select a.a_id, b.b_id, c.*
from a left join b on (a.a_id = b.a_id)
left join c on (b.b_id = c.b_id)
order by a.a_id, b.b_id, c.c_id;
... then when you switch to a new tablea / tableb, SELECT
the rest of its rows then. You'll probably use a second connection to do this so you don't distrupt the result set and cursor state on the main connection you're reading rows from.
For smaller data sets, or for the inner levels of bigger data sets, you can use PostgreSQL's XML support to construct the XML documents, e.g.:
WITH xmlinput AS (
SELECT a, b, c
FROM a
LEFT JOIN b ON (a.a_id = b.a_id)
LEFT JOIN c on (b.b_id = c.b_id)
ORDER BY a.a_id, b.b_id, c.c_id
)
SELECT
XMLELEMENT(name items,
xmlagg(
XMLELEMENT(name a,
XMLFOREST((a).a_id AS a_id, (a)."name" AS name),
b_xml
)
ORDER BY (a).a_id)
) AS output
FROM
(
SELECT
a,
xmlagg(
XMLELEMENT(name b,
XMLFOREST((b).b_id AS b_id, (b).val AS val),
c_xml
)
ORDER BY (b).b_id)
AS b_xml
FROM
(
SELECT
a, b,
xmlagg(
XMLELEMENT(name c,
XMLFOREST((c).c_id AS c_id, (c).blah AS blah)
)
ORDER BY (c).c_id)
AS c_xml
FROM xmlinput
GROUP BY a, b
) c_as_xml
GROUP BY a
) b_as_xml;
... but really, you've got to be some kind of masochist to write code like that. Though it could prove to be fairly fast.
To understand the query you'll need to read the PostgreSQL XML docs. The wackly syntax was dreamed up by the SQL/XML committee, don't blame us.
Also note that row-variables are used heavily in the above code to keep it organized. a
, b
and c
are passed as whole rows to outer layers of the query. This avoids the need to mess with aliases when names collide. The syntax (a).a_id
, etc, means "the a_id
field of the row-variable a
". See the PostgreSQL manual for details.
The above uses a better XML structure (see comments below). If you want to emit attributes not elements, you can change the XMLFOREST
calls to XMLATTRIBUTES
calls.
Output:
1 fred 11 x 1 whatever 2 gah 12 y 3 borkbork 2 bert 21 a 22 b 4 fuzz
or, pretty-printed:
1
fred
11
x
1
whatever
2
gah
12
y
3
borkbork
2
bert
21
a
22
b
4
fuzz
On a side note, using attributes like that in XML seems tempting, but it quickly gets difficult and ugly to work with. Please just use normal XML elements:
1
blah
1
42
...
...
...
...