I have an XML file which is very very large (millions of records). Due to speed and memory constraints I plan to use XMLReader
/XMLWriter
.
How to change the id attribute of a DomNode and save changes to the original XML File using XMLWriter again?
This does not work that way. If you use XMLReader and XMLWriter to operate on the same file simultaneously, the file will be truncated by the writer and the reader will spit errors and stop working.
However, you can operate on different files.
So what you can do is to use an XMLReader to read the document and while you operate on it use XMLWriter to write to another document based on what you've read and occasionally modified. After you're done, you can then rename the newly written file to the old filename.
For an XML document (shortened for the example, XMLReader and XMLWriter make naturally sense with really huge documents) like this one modeled a bit after your question:
<DBOS>
<ITEMS>
<ITEM>item #1</ITEM>
<ITEM>item #2</ITEM>
<ITEM>item #3</ITEM>
</ITEMS>
<ITEMS>
<ITEM>item #4</ITEM>
<ITEM>item #5</ITEM>
</ITEMS>
</DBOS>
A working code-example is:
<?php
/*
* This file is part of the XMLReaderIterator package.
*
* Copyright (C) 2012, 2014 hakre <http://hakre.wordpress.com>
*
* Example: Write XML with XMLWriter while reading from XMLReader with XMLWriterIteration
*/
require('xmlreader-iterators.php'); // require XMLReaderIterator library
$xmlInputFile = 'data/dobs-items.xml';
$xmlOutputFile = 'php://output';
$reader = new XMLReader();
$reader->open($xmlInputFile);
$writer = new XMLWriter();
$writer->openUri($xmlOutputFile);
$iterator = new XMLWritingIteration($writer, $reader);
$writer->startDocument();
$itemsCount = 0;
$itemCount = 0;
foreach ($iterator as $node) {
$isElement = $node->nodeType === XMLReader::ELEMENT;
if ($isElement && $node->name === 'ITEMS') {
// increase counter for <ITEMS> elements and reset <ITEM> counter
$itemsCount++;
$itemCount = 0;
}
if ($isElement && $node->name === 'ITEM') {
// increase <ITEM> counter and insert "id" attribute
$itemCount++;
$writer->startElement($node->name);
$writer->writeAttribute('id', $itemsCount . "-" . $itemCount);
if ($node->isEmptyElement) {
$writer->endElement();
}
} else {
// handle everything else
$iterator->write();
}
}
$writer->endDocument();
The output then is (exemplary to standard output, any valid PHP file-name can be used):
<?xml version="1.0"?>
<DBOS>
<ITEMS>
<ITEM id="1-1">item #1</ITEM>
<ITEM id="1-2">item #2</ITEM>
<ITEM id="1-3">item #3</ITEM>
</ITEMS>
<ITEMS>
<ITEM id="2-1">item #4</ITEM>
<ITEM id="2-2">item #5</ITEM>
</ITEMS>
</DBOS>
As this example shows, the id attributes are added based on the numbering by the different counter variables.
The XMLWritingIteration makes this easy as it deals with all other nodes and cases thanks to $iterator->write()
.
The example and code is part of the XMLReaderIterator package. There is also another example that is creating a DOMDocument based on XMLReader, it is part of an answer to "How to distinguish between empty element and null-size string in DOMDocument?".