iTextSharp 5 polish character

后端 未结 5 609
无人共我
无人共我 2020-12-02 00:43

I have problem with polish character using itextSharp. I want to create pdf from html. Everything works fine but polish character are missing. I use function lower:

相关标签:
5条回答
  • 2020-12-02 00:46

    Just to roll together what @Mark Storer said:

    private void createPDF(string html)
    {
        //MemoryStream msOutput = new MemoryStream();
        TextReader reader = new StringReader(html);// step 1: creation of a document-object
        Document document = new Document(PageSize.A4, 30, 30, 30, 30);
    
        // step 2:
        // we create a writer that listens to the document
        // and directs a XML-stream to a file
        PdfWriter writer = PdfWriter.GetInstance(document, new FileStream("Test.pdf", FileMode.Create));
    
        // step 3: we create a worker parse the document
        HTMLWorker worker = new HTMLWorker(document);
    
        // step 4: we open document and start the worker on the document
        document.Open();
    
        // step 4.1: register a unicode font and assign it an allias
        FontFactory.Register("C:\\Windows\\Fonts\\ARIALUNI.TTF", "arial unicode ms");
    
        // step 4.2: create a style sheet and set the encoding to Identity-H
        iTextSharp.text.html.simpleparser.StyleSheet ST = New iTextSharp.text.html.simpleparser.StyleSheet();
        ST.LoadTagStyle("body", "encoding", "Identity-H");
    
        // step 4.3: assign the style sheet to the html parser
        worker.Style = ST;
    
        worker.StartDocument();
    
        // step 5: parse the html into the document
        worker.Parse(reader);
    
        // step 6: close the document and the worker
        worker.EndDocument();
        worker.Close();
        document.Close();
    }
    

    And when you call it wrap your text in a font using the name you registered above:

    createPDF("<font face=""arial unicode ms"">ĄąćęĘłŁŃńóÓŚśŹźŻż</font>");
    
    0 讨论(0)
  • 2020-12-02 00:54

    When creating your BaseFont you need to specify that you want to use UniCode characters. This answer shows how.

    0 讨论(0)
  • 2020-12-02 00:55

    1) iText 5.0.6 was released today with a major overhaul to the HTML->PDF conversion code. I suggest you try the new code instead.

    2) I'm almost positive that setting the directContent like that won't affect the pdf content generated by HTMLWorker. I'm 99% sure that it'll [re]set the font before it draws any text.

    3) Try wrapping your string in <font face="AFontThatActuallyContainsThoseCharacters"> tags. I seriously doubt the default font HTMLWorker picks will be up for the job.

    Nope. The default is Helvetica with WinAnsiEncoding. Definitely not suitable to anything outside typical English/German/French/Spanish.

    You should be able to use HTMLWorker.setStyleSheet to set some friendlier defaults. You'll want to set the "face" and "encoding" to something more Polish-Friendly. I recommend "Identity-H" for the encoding, which gives access to all characters in the font you go with, regardless of language. For a font, there's a program called "charmap.exe" in windows since WayBack that will show you which characters a font has available in a given encoding (including unicode). The "Arial" family looks good, as do several others.


    "the new code" probably won't change any behavior you're seeing. It's a refactoring to make future (next release as I understand it) changes easier.

    My suggestion is to go with setStyleSheet():

       // step 3: we create a worker parse the document
       HTMLWorker worker = new HTMLWorker(document);
    
       StyleSheet sheet = new StyleSheet;
    
       HashMap<String, String> styleMap = new HashMap<String, String>();
       styleMap.put("face", "Arial"); // default font
       styleMap.put("encoding", "Identity-H"); // default encoding
    
       String tags[] = {"p", "div", ...};
       for (String tag : tags) {
         sheet.applyStyle( tag, styleMap );
       }
    

    I'm not sure, but you might be able to just applyStyle("body", styleMap) and have it cascade down into everything it contains, but I'm not sure. I'm also not sure that this would address your 1-line-test as there are no tags involved. IIRC, we build a body tag if there isn't one, but I'm not at all sure.

    0 讨论(0)
  • 2020-12-02 01:02

    I GOT THE ANSWER! =) (specifically targeted for polish) I feel obligated to put it here in this old thread, since i'm sure I wont be the last to find it.

    I'm severely disappointed that there aren't any good answers to this... most of them suggest using the ARIALUNI.TTF in your Windows FONTS folder which results in your PDF file being MANY times bigger. The solution doesn't need to be so drastic...

    Many others suggest examples showing encoding with cp1252 which fails on Arial and doesn't work with Helvetica for Polish text.

    I'm using iTextSharp 4.1.6... the trick is... cp1257! And you can use it with BaseFont.Courier, BaseFont.Helvetica, BaseFont.Times-Roman

    This works... and my PDF files are tiny (3kb!)

    document.Open();
    var bigFont = FontFactory.GetFont(BaseFont.COURIER, BaseFont.CP1257, 18, Font.BOLD);
    var para = new Paragraph("Oryginał", bigFont);
    document.Add(pgDocType);
    document.Close();
    

    I will test later and make sure I can open and read these in Windows XP and Mac OSX in addition to Windows 7.

    0 讨论(0)
  • 2020-12-02 01:09

    As I browsed over various forums and stackoverflow questions, I found no answer with a complex solution to the special characters problem. I tried to provide one in exchange of quite a long reply to the question. Hopefully this will help someone out...

    I used the XMLWorker from SourceForge as HtmlWorker became depricated. The problem with special characters remained thought. I found two solutions that actually work and can be used both separately and combined.

    HTML & CSS solution

    Each tag involved need to have font-family style specified in order to be interpreted correctly by ParseXHtml method (I am not sure why nested tag styles inheritance does not work here but it seems it really doesn't or it doesn't fully).

    This solution allows to modify resulting PDF based on HTML code only, thus some scenarios without code recompilation might take place.

    Simplified code (for an MVC app) would be like that:

    Controller:

    public FileStreamResult GetPdf()
    {
        const string CONTENT_TYPE = "application/pdf"
        var fileName = "mySimple.pdf";
        var html = GetViewPageHtmlCode();
        //the way how to capture view HTML are described in other threads, e.g. [here][2]
        var css = Server.MapPath("~/Content/Pdf.css");
        using (var capturedActionStream = new MemoryStream(USED_ENCODING.GetBytes(html)))
        {
            using (var cssFile = new FileStream(css),  FileMode.Open))
            {
                var memoryStream = new MemoryStream();
                //to create landscape, use PageSize.A4.Rotate() for pageSize
                var document = new Document(PageSize.A4, 30, 30, 10, 10);
                var writer = PdfWriter.GetInstance(document, memoryStream);
                var worker = XMLWorkerHelper.GetInstance();
    
                document.Open();
                worker.ParseXHtml(writer, document, capturedActionStream, cssFile);
                writer.CloseStream = false;
                document.Close();
                memoryStream.Position = 0;
    
                //to enforce file download
                HttpContext.Response.AddHeader(
                    "Content-Disposition",
                    String.Format("attachment; filename={0}", fileName));
                var wrappedPdf = new FileStreamResult(memoryStream, CONTENT_TYPE);
                return wrappedPdf;
            }
        }
    }
    

    CSS:

    body {
        background-color: white;
        font-size: .85em;
        font-family: Arial;
        margin: 0;
        padding: 0;
        color: black;
    }
    
    p, ul {
        margin-bottom: 20px;
        line-height: 1.6em;
    }
    
    div, span {
        font-family: Arial;
    }
    
    h1, h2, h3, h4, h5, h6 {
        font-size: 1.5em;
        color: #000;
        font-family: Arial;
    }
    

    View layout

    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
        <html xmlns="http://www.w3.org/1999/xhtml">
        <head>
            <meta http-equiv="content-type" content="text/html; charset=utf-8"/>
            <title>@ViewBag.Title</title>
            <link href="@Url.Content("~/Content/Pdf.css")" rel="stylesheet" type="text/css" />
        </head>
        <body>
            <div class="page">
                <div id="main">
                    @RenderBody()
                </div>
            </div>
        </body>
        </html>
    

    View page

    @{
        ViewBag.Title = "PDF page title"
    }
    
    <h1>@ViewBag.Title</h1>
    
    <p>
        ěščřžýáíéů ĚŠČŘŽÝÁÍÉŮ
    </p>
    

    Inside-code font-replacing solution

    In this solution, the font returned by an IFontProvider is modified to the one contains (correct) representation of special characters and BaseFont.IDENTITY_H encoding is used. Advantage of the approach is, that there is exactly one font that is used. This is also disadvantage of the sort.

    Also, this solutions expects the font is part of the project (*.ttf file(s) placed in Content/Fonts folder).

    Alternatively the fonts can be retrieved from Windows fonts location: Environment.GetFolderPath(Environment.SpecialFolder.Fonts) - this requires knowledge (or strong belief) of fonts installed on the server or control over the server

    FontProvider (over FontFactory)

    I took my liberty to extend Gregor S's solution a bit, that provides more complex FontFactory that can be used for variety of HTML "templates" pushed through XMLWorker.

    public class CustomFontFactory : FontFactoryImp
    {
        public const Single DEFAULT_FONT_SIZE = 12;
        public const Int32 DEFAULT_FONT_STYLE = 0;
        public static readonly BaseColor DEFAULT_FONT_COLOR = BaseColor.BLACK;
    
        public String DefaultFontPath { get; private set; }
        public String DefaultFontEncoding { get; private set; }
        public Boolean DefaultFontEmbedding { get; private set; }
        public Single DefaultFontSize { get; private set; }
        public Int32 DefaultFontStyle { get; private set; }
        public BaseColor DefaultFontColor { get; private set; }
    
        public Boolean ReplaceEncodingWithDefault { get; set; }
        public Boolean ReplaceEmbeddingWithDefault { get; set; }
        public Boolean ReplaceFontWithDefault { get; set; }
        public Boolean ReplaceSizeWithDefault { get; set; }
        public Boolean ReplaceStyleWithDefault { get; set; }
        public Boolean ReplaceColorWithDefault { get; set; }
    
        public BaseFont DefaultBaseFont { get; protected set; }
    
        public CustomFontFactory(
            String defaultFontFilePath,
            String defaultFontEncoding = BaseFont.IDENTITY_H,
            Boolean defaultFontEmbedding = BaseFont.EMBEDDED,
            Single? defaultFontSize = null,
            Int32? defaultFontStyle = null,
            BaseColor defaultFontColor = null,
            Boolean automaticalySetReplacementForNullables = true)
        {
            //set default font properties
            DefaultFontPath =  defaultFontFilePath;
            DefaultFontEncoding = defaultFontEncoding;
            DefaultFontEmbedding = defaultFontEmbedding;
            DefaultFontColor = defaultFontColor == null
                ? DEFAULT_FONT_COLOR
                : defaultFontColor;
            DefaultFontSize = defaultFontSize.HasValue
                ? defaultFontSize.Value
                : DEFAULT_FONT_SIZE;
            DefaultFontStyle = defaultFontStyle.HasValue
                ? defaultFontStyle.Value
                : DEFAULT_FONT_STYLE;
    
            //set default replacement options
            ReplaceFontWithDefault = false;
            ReplaceEncodingWithDefault = true;
            ReplaceEmbeddingWithDefault = false;
    
            if (automaticalySetReplacementForNullables)
            {
                ReplaceSizeWithDefault = defaultFontSize.HasValue;
                ReplaceStyleWithDefault = defaultFontStyle.HasValue;
                ReplaceColorWithDefault = defaultFontColor != null;
            }
    
            //define default font
            DefaultBaseFont = BaseFont.CreateFont(DefaultFontPath, DefaultFontEncoding, DefaultFontEmbedding);
    
            //register system fonts
            FontFactory.RegisterDirectories();
        }
    
        protected Font GetBaseFont(Single size, Int32 style, BaseColor color)
        {
            var baseFont = new Font(DefaultBaseFont, size, style, color);
    
            return baseFont;
        }
    
        public override Font GetFont(String fontname, String encoding, Boolean embedded, Single size, Int32 style, BaseColor color, Boolean cached)
        {
            //eventually replace expected font properties
            size = ReplaceSizeWithDefault
                ? DefaultFontSize
                : size;
            style = ReplaceStyleWithDefault
                ? DefaultFontStyle
                : style;
            encoding = ReplaceEncodingWithDefault
                ? DefaultFontEncoding
                : encoding;
            embedded = ReplaceEmbeddingWithDefault
                ? DefaultFontEmbedding
                : embedded;
    
            //get font
            Font font = null;
            if (ReplaceFontWithDefault)
            {
                font = GetBaseFont(
                    size,
                    style,
                    color);
            }
            else
            {
                font = FontFactory.GetFont(
                    fontname,
                    encoding,
                    embedded,
                    size,
                    style,
                    color,
                    cached);
    
                if (font.BaseFont == null)
                    font = GetBaseFont(
                        size,
                        style,
                        color);
            }
    
            return font;
        }
    }
    

    Controller

    private const String DEFAULT_FONT_LOCATION = "~/Content/Fonts";
    private const String DEFAULT_FONT_NAME = "arialn.ttf";
    
    public FileStreamResult GetPdf()
    {
        const string CONTENT_TYPE = "application/pdf"
        var fileName = "mySimple.pdf";
        var html = GetViewPageHtmlCode();
        //the way how to capture view HTML are described in other threads, e.g. 
        var css = Server.MapPath("~/Content/Pdf.css");
        using (var capturedActionStream = new MemoryStream(USED_ENCODING.GetBytes(html)))
        {
            using (var cssFile = new FileStream(css),  FileMode.Open))
            {
                var memoryStream = new MemoryStream();
                var document = new Document(PageSize.A4, 30, 30, 10, 10);
                //to create landscape, use PageSize.A4.Rotate() for pageSize
                var writer = PdfWriter.GetInstance(document, memoryStream);
                var worker = XMLWorkerHelper.GetInstance();
                var defaultFontPath = Server
                    .MapPath(Path
                        .Combine(
                            DEFAULT_FONT_LOCATION,
                            DEFAULT_FONT_NAME));
                var fontProvider = new CustomFontFactory(defaultFontPath);
    
                document.Open();
                worker.ParseXHtml(writer, document, capturedActionStream, cssFile, fontProvider);
                writer.CloseStream = false;
                document.Close();
                memoryStream.Position = 0;
    
                //to enforce file download
                HttpContext.Response.AddHeader(
                    "Content-Disposition",
                    String.Format("attachment; filename={0}", fileName));
                var wrappedPdf = new FileStreamResult(memoryStream, CONTENT_TYPE);
                return wrappedPdf;
            }
        }
    }
    

    CSS:

    body {
        background-color: white;
        font-size: .85em;
        font-family: "Trebuchet MS", Verdana, Helvetica, Sans-Serif;
        margin: 0;
        padding: 0;
        color: black;
    }
    
    p, ul {
        margin-bottom: 20px;
        line-height: 1.6em;
    }
    
    h1, h2, h3, h4, h5, h6 {
        font-size: 1.5em;
        color: #000;
    }
    

    View layout

    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
        <html xmlns="http://www.w3.org/1999/xhtml">
        <head>
            <meta http-equiv="content-type" content="text/html; charset=utf-8"/>
            <title>@ViewBag.Title</title>
            <link href="@Url.Content("~/Content/Pdf.css")" rel="stylesheet" type="text/css" />
        </head>
        <body>
            <div class="page">
                <div id="main">
                    @RenderBody()
                </div>
            </div>
        </body>
        </html>
    

    View page

    @{
        ViewBag.Title = "PDF page title"
    }
    
    <h1>@ViewBag.Title</h1>
    
    <p>
        ěščřžýáíéů ĚŠČŘŽÝÁÍÉŮ
    </p>
    

    Other useful (re)sources:

    • Cause of the problem
    • Working with fonts
    • Bunch of alternative solutions and font replacements
    • About font providers
    0 讨论(0)
提交回复
热议问题