My plan is to read in an XML document using my C# program, search for particular entries which I\'d like to change, and then write out the modified document. However, I\'ve bec
One fairly easy approach would be to create a new XmlDocument
, then use the Load()
method to populate it. Once you've got the document, you can use CreateNavigator()
to get an XPathNavigator
object that you can use to find and alter elements in the document. Finally, you can use the Save()
method on the XmlDocument
to write the changed document back out.
For the task in hand - (read existing doc, write, and modify in a formalised way) I'd go with XPathDocument run through an XslCompiledTransform.
Where you can't formalise, don't have pre-existing docs or generally need more adaptive logic, I'd go with LINQ and XDocument like Skeet says.
Basically if the task is transformation then XSLT, if the task is manipulation then LINQ.
If it's actually valid XML, and will easily fit in memory, I'd choose LINQ to XML (XDocument
, XElement
etc) every time. It's by far the nicest XML API I've used. It's easy to form queries, and easy to construct new elements too.
You can use XPath where that's appropriate, or the built-in axis methods (Elements()
, Descendants()
, Attributes()
etc). If you could let us know what specific bits you're having a hard time with, I'd be happy to help work out how to express them in LINQ to XML.
If, on the other hand, this is HTML which isn't valid XML, you'll have a much harder time - because XML APIs generalyl expect to work with valid XML documents. You could use HTMLTidy first of course, but that may have undesirable effects.
For your specific example:
XDocument doc = XDocument.Load("file.xml");
foreach (var img in doc.Descendants("img"))
{
// src will be null if the attribute is missing
string src = (string) img.Attribute("src");
img.SetAttributeValue("src", src + "with-changes");
}
If you have smaller documents which fit in computers memory you can use XmlDocument
.
Otherwise you can use XmlReader
to iterate through the document.
Using XmlReader
you can find out the elements type using:
while (xml.Read()) {
switch xml.NodeType {
case XmlNodeType.Element:
//Do something
case XmlNodeType.Text:
//Do something
case XmlNodeType.EndElement:
//Do something
}
}
Just start by reading the documentation of the Xml namespace on the MSDN. Then if you have more specific questions, post them here...
My favorite tool for this kind of thing is HtmlAgilityPack. I use it to parse complex HTML documents into LINQ-queryable collections. It is an extremely useful tool for querying and parsing HTML (which is often not valid XML).
For your problem, the code would look like:
var htmlDoc = HtmlAgilityPack.LoadDocument(stringOfHtml);
var images = htmlDoc.DocumentNode.SelectNodes("//img[id=lookforthis]");
if(images != null)
{
foreach (HtmlNode node in images)
{
node.Attributes.Append("alt", "added an alt to lookforthis images.");
}
}
htmlDoc.Save('output.html');