Read German text from XML and write to a PDF

淺唱寂寞╮ 提交于 2019-12-12 04:33:19

问题


I have an XML (in UTF-8). I have to read a value of a std::string variable from it using PugiXML libraries. After reading the value, I am printing it on console but in my actual project, I have to put that value to a PDF (using LibHaru libraries). My MWE is following:

#include <iostream>
#include "pugiconfig.hpp"
#include "pugixml.hpp"

using namespace pugi;

int main()
{   
    pugi::xml_document doc;
    pugi::xml_parse_result result = doc.load_file(FILEPATH);

    xml_node root_node = doc.child("Report");
    xml_node SystemName_node = root_node.child("SystemName");

    std::string strSystemName = SystemName_node.child_value();

    std::cout<<" The name of the system is: "<<strSystemName<<std::endl;

    return 0;
}

I am reading the value of a variable std::string strSystemName from a XML file using Pugixml libraries. After reading the variable I am printing it on screen (in my actual project, I am writing it to a pdf file). Problem: During debugging, I found that the strange characters have been read from the XML file (which is already in UTF-8), which appears if I print the variable on screen or put it to the pdf.

IMPORTANT: Printing to console is not too important. Important is to put it properly to the PDF file which is also using UTF-8 encoding. But I think that storing the variable in std::string is somehow creating problem and therefore the wrone value is passed to the PDF writer.

PS: I am using VS2010 which is without C++11.


回答1:


The problem here is that std::cout is just reflecting the UTF-8 bytes in the string to the console. Normally on Windows, the console is not running in UTF-8, but in (for example) code page 1252, so the two bytes of a UTF-8 'ä` get displayed as two characters.

Your solution is either to convert the console to UTF-8 (see this answer), or to convert your UTF-8 string into a CP-1252 string. I think this is going to require MultiByteToWideChar (specifying UTF-8) + WideCharToMultiByte (specifying CP-1252)

To debug your actual problem (passing UTF-8 strings into pugixml), you need to look at the actual bytes in the strings, and check they are what you think they are.



来源:https://stackoverflow.com/questions/41138389/read-german-text-from-xml-and-write-to-a-pdf

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!