Using itextsharp xmlworker to convert html to pdf and write text vertically

后端未结

关注

 2  519

盖世英雄少女心 2021-02-08 11:42

Is there possible to achieve writing text direction bottom-up in xmlworker? I would like to use it in table. My code is

     
        
          2条回答        

        
                    
            
            
                         
                
              
              
                
                   渐次进展
                                             
                
                
                (楼主)
            
              
              
                2021-02-08 11:49
              

            
            
                        
This was a pretty interesting problem, so +1 to the question. 

The first step was to lookup whether or not iTextSharp XML Worker supports the HTML td tag. The mappings can be found in the source in iTextSharp.tool.xml.html.Tags. There you find td is mapped to iTextSharp.tool.xml.html.table.TableData, which makes the job of implementing a custom tag processor a little easier. I.e. all we need to do inherit from the class and override End():

public class TableDataProcessor : TableData
{
    /*
     * a **very** simple implementation of the CSS writing-mode property:
     * https://developer.mozilla.org/en-US/docs/Web/CSS/writing-mode
     */
    bool HasWritingMode(IDictionary attributeMap)
    {
        bool hasStyle = attributeMap.ContainsKey("style");
        return hasStyle
                && attributeMap["style"].Split(new char[] { ';' })
                .Where(x => x.StartsWith("writing-mode:"))
                .Count() > 0
            ? true : false;
    }

    public override IList End(
        IWorkerContext ctx,
        Tag tag,
        IList currentContent)
    {
        var cells = base.End(ctx, tag, currentContent);
        var attributeMap = tag.Attributes;
        if (HasWritingMode(attributeMap))
        {
            var pdfPCell = (PdfPCell) cells[0];
            // **always** 'sideways-lr'
            pdfPCell.Rotation = 90;
        }
        return cells;
    }
}


As noted in the inline comments, this is a very simple implementation for your specific needs. You'll need to add extra logic to support any other writing-mode CSS property value, and include any sanity checks.

UPDATE

Based on the comment left by @Daniel, it's not clear how to add custom CSS when converting the HTML to PDF. First the updated HTML:

string XHTML = @"
Table with Vertical Text

     

        
      
      
      

First
Second
1
2


Table without Vertical Text

0
1
2
3
";



Then a small snippet of custom CSS:

string CSS = @"
    body {font-size: 12px;}
    table {border-collapse:collapse; margin:8px;}
    .light-yellow {background-color:#ffff99;}
    td {border:1px solid #ccc;padding:4px;}
";


The slightly difficult part is the extra setup - you can't use the simple out of the box XMLWorkerHelper.GetInstance().ParseXHtml() commonly seen here at SO. Here's a simple helper method that should get you started:

public void ConvertHtmlToPdf(string xHtml, string css)
{
    using (var stream = new FileStream(OUTPUT_FILE, FileMode.Create))
    {
        using (var document = new Document())
        {
            var writer = PdfWriter.GetInstance(document, stream);
            document.Open();

            // instantiate custom tag processor and add to `HtmlPipelineContext`.
            var tagProcessorFactory = Tags.GetHtmlTagProcessorFactory();
            tagProcessorFactory.AddProcessor(
                new TableDataProcessor(), 
                new string[] { HTML.Tag.TD }
            );
            var htmlPipelineContext = new HtmlPipelineContext(null);
            htmlPipelineContext.SetTagFactory(tagProcessorFactory);

            var pdfWriterPipeline = new PdfWriterPipeline(document, writer);
            var htmlPipeline = new HtmlPipeline(htmlPipelineContext, pdfWriterPipeline);

            // get an ICssResolver and add the custom CSS
            var cssResolver = XMLWorkerHelper.GetInstance().GetDefaultCssResolver(true);
            cssResolver.AddCss(css, "utf-8", true);
            var cssResolverPipeline = new CssResolverPipeline(
                cssResolver, htmlPipeline
            );

            var worker = new XMLWorker(cssResolverPipeline, true);
            var parser = new XMLParser(worker);
            using (var stringReader = new StringReader(xHtml))
            {
                parser.Parse(stringReader);
            }
        }
    }
}


Instead of rehashing an explanation of the example code above, see the documentation (iText removed documentation, linked to Wayback Machine) to get a better idea of why you need to setup the parser that way.

Also note:


XML Worker does not support all CSS2/CSS3 properties, so you may need to experiment with what works or doesn't work with regards to how close you want the PDF to look to the HTML displayed in the browser.
The HTML snippet removed the p tag, since the style can be applied directly to the td tag.
The inline width property. If omitted the columns will be variable widths that match if the text had been rendered horizontally.


Tested with iTextSharp and XML Worker versions 5.5.9 Here's the updated result: