My source XML has the copyright character in it as ©
. When writing the XML with this code:
var stringWriter = new StringWriter();
segment
i had the same problem when saving some lithuanian characters in this way. i found a way to cheat around this by replacing &
with &
(©
to write ©
and so on) it looks strange but it worked for me :)
Maybe you can try to diffent document encoding, check out: http://www.sagehill.net/docbookxsl/CharEncoding.html
I strongly suspect you won't be able to do this. Fundamentally, the copyright sign is ©
- they're different representations of the same thing, and I expect that the in-memory representation normalizes this.
What are you doing with the XML afterwards? Any sane application processing the resulting XML should be fine with it.
You may be able to persuade it to use the entity reference if you explicitly encode it with ASCII... but I'm not sure.
EDIT: You can definitely make it use a different encoding. You just need a StringWriter
which reports that its "native" encoding is UTF-8. Here's a simple class you can use for that:
public class Utf8StringWriter : StringWriter
{
public override Encoding Encoding
{
get { return Encoding.UTF8; }
}
}
You could try changing it to use Encoding.ASCII
as well and see what that does to the copyright sign...
It appears that UTF8 won't solve the problem. The following has the same symptoms as your code:
MemoryStream ms = new MemoryStream();
XmlTextWriter writer = new XmlTextWriter(ms, new UTF8Encoding());
segmentDoc.Save(writer);
ms.Seek(0L, SeekOrigin.Begin);
var reader = new StreamReader(ms);
var result = reader.ReadToEnd();
Console.WriteLine(result);
I tried the same approach with ASCII, but wound up with ?
instead of ©.
I think using a string replace after converting the XML to a string is your best bet to get the effect you want. Of course, that could be cumbersome if you are interested in more than just the @copy; symbol.
result = result.Replace("©", "\u0026#x00A9;");