Using itextsharp xmlworker to convert html to pdf and write text vertically

后端 未结 2 509
盖世英雄少女心
盖世英雄少女心 2021-02-08 11:42

Is there possible to achieve writing text direction bottom-up in xmlworker? I would like to use it in table. My code is

     
相关标签:
2条回答
  • 2021-02-08 11:49

    This was a pretty interesting problem, so +1 to the question.

    The first step was to lookup whether or not iTextSharp XML Worker supports the HTML td tag. The mappings can be found in the source in iTextSharp.tool.xml.html.Tags. There you find td is mapped to iTextSharp.tool.xml.html.table.TableData, which makes the job of implementing a custom tag processor a little easier. I.e. all we need to do inherit from the class and override End():

    public class TableDataProcessor : TableData
    {
        /*
         * a **very** simple implementation of the CSS writing-mode property:
         * https://developer.mozilla.org/en-US/docs/Web/CSS/writing-mode
         */
        bool HasWritingMode(IDictionary<string, string> attributeMap)
        {
            bool hasStyle = attributeMap.ContainsKey("style");
            return hasStyle
                    && attributeMap["style"].Split(new char[] { ';' })
                    .Where(x => x.StartsWith("writing-mode:"))
                    .Count() > 0
                ? true : false;
        }
    
        public override IList<IElement> End(
            IWorkerContext ctx,
            Tag tag,
            IList<IElement> currentContent)
        {
            var cells = base.End(ctx, tag, currentContent);
            var attributeMap = tag.Attributes;
            if (HasWritingMode(attributeMap))
            {
                var pdfPCell = (PdfPCell) cells[0];
                // **always** 'sideways-lr'
                pdfPCell.Rotation = 90;
            }
            return cells;
        }
    }
    

    As noted in the inline comments, this is a very simple implementation for your specific needs. You'll need to add extra logic to support any other writing-mode CSS property value, and include any sanity checks.

    UPDATE

    Based on the comment left by @Daniel, it's not clear how to add custom CSS when converting the HTML to PDF. First the updated HTML:

    string XHTML = @"
    <h1>Table with Vertical Text</h1>
    <table><tr>
    <td style='writing-mode:sideways-lr;text-align:center;width:40px;'>First</td>
    <td style='writing-mode:sideways-lr;text-align:center;width:40px;'>Second</td></tr>
    <tr><td style='text-align:center'>1</td>
    <td style='text-align:center'>2</td></tr></table>
    
    <h1>Table <u>without</u> Vertical Text</h1>
    <table width='50%'>
    <tr><td class='light-yellow'>0</td></tr>
    <tr><td>1</td></tr>
    <tr><td class='light-yellow'>2</td></tr>
    <tr><td>3</td></tr>
    </table>";
    

    Then a small snippet of custom CSS:

    string CSS = @"
        body {font-size: 12px;}
        table {border-collapse:collapse; margin:8px;}
        .light-yellow {background-color:#ffff99;}
        td {border:1px solid #ccc;padding:4px;}
    ";
    

    The slightly difficult part is the extra setup - you can't use the simple out of the box XMLWorkerHelper.GetInstance().ParseXHtml() commonly seen here at SO. Here's a simple helper method that should get you started:

    public void ConvertHtmlToPdf(string xHtml, string css)
    {
        using (var stream = new FileStream(OUTPUT_FILE, FileMode.Create))
        {
            using (var document = new Document())
            {
                var writer = PdfWriter.GetInstance(document, stream);
                document.Open();
    
                // instantiate custom tag processor and add to `HtmlPipelineContext`.
                var tagProcessorFactory = Tags.GetHtmlTagProcessorFactory();
                tagProcessorFactory.AddProcessor(
                    new TableDataProcessor(), 
                    new string[] { HTML.Tag.TD }
                );
                var htmlPipelineContext = new HtmlPipelineContext(null);
                htmlPipelineContext.SetTagFactory(tagProcessorFactory);
    
                var pdfWriterPipeline = new PdfWriterPipeline(document, writer);
                var htmlPipeline = new HtmlPipeline(htmlPipelineContext, pdfWriterPipeline);
    
                // get an ICssResolver and add the custom CSS
                var cssResolver = XMLWorkerHelper.GetInstance().GetDefaultCssResolver(true);
                cssResolver.AddCss(css, "utf-8", true);
                var cssResolverPipeline = new CssResolverPipeline(
                    cssResolver, htmlPipeline
                );
    
                var worker = new XMLWorker(cssResolverPipeline, true);
                var parser = new XMLParser(worker);
                using (var stringReader = new StringReader(xHtml))
                {
                    parser.Parse(stringReader);
                }
            }
        }
    }
    

    Instead of rehashing an explanation of the example code above, see the documentation (iText removed documentation, linked to Wayback Machine) to get a better idea of why you need to setup the parser that way.

    Also note:

    1. XML Worker does not support all CSS2/CSS3 properties, so you may need to experiment with what works or doesn't work with regards to how close you want the PDF to look to the HTML displayed in the browser.
    2. The HTML snippet removed the p tag, since the style can be applied directly to the td tag.
    3. The inline width property. If omitted the columns will be variable widths that match if the text had been rendered horizontally.

    Tested with iTextSharp and XML Worker versions 5.5.9 Here's the updated result:

    0 讨论(0)
  • 2021-02-08 11:53
    public void addHtmlToPdf(Document document, PdfWriter writer, String html) {
        PdfPTable table = new PdfPTable(1);
        PdfPCell cell = new PdfPCell();
        ElementList list = XMLWorkerHelper.ParseToElementList(html, null);
        foreach(IElement element in list) {
            cell.AddElement(element);
        }
        table.AddCell(cell);
        document.Add(table);
    }
    

    Alternative with utf8:

    public void addHtmlToPdf_Utf8(Document document, PdfWriter writer, String html) 
    {
        XMLWorkerHelper xml = XMLWorkerHelper.GetInstance();
        xml.ParseXHtml(writer, document, stringToStream(html), System.Text.Encoding.UTF8);
    }
    public Stream stringToStream(string txt) {
        var stream = new MemoryStream();
        var w = new StreamWriter(stream);
        w.Write(txt);
        w.Flush();
        stream.Position = 0;
        return stream;
    }
    
    0 讨论(0)
提交回复
热议问题