iTextSharp international text

前端 未结 4 490
一个人的身影
一个人的身影 2020-12-03 15:11

I have a table in asp.net page,and trying to export it as a PDF file,I have couple of international characters that are not shown in generated PDF file,any suggestions,

相关标签:
4条回答
  • 2020-12-03 15:16

    It caused by default iTextSharp font - Helvetica - that does not support other than base characters (or not support all other characters.

    There are actually 2 options:

    1. One is to rewrite the table content by hand into the code. This approach might look faster to you, but it requires any modification to the original table to be repeated in the code as well (breaking DRY principle). In this case, you can easily set-up font as you wish.
    2. The other is to extract PDF from HTML extracted from HtmlEngine. This might sound a bit more complicated and complex (and it is), however, working solution is much more flexible and universal. I suffered the struggle with special characters myself just a while ago and decided to post a somewhat complete solution under other similar solution here on stackoverflow: https://stackoverflow.com/a/24587745/1138663
    0 讨论(0)
  • 2020-12-03 15:20

    You can try setting the encoding for the font you are using. In Java would be something like this:

    BaseFont bf = BaseFont.createFont(BaseFont.HELVETICA, BaseFont.CP1252, BaseFont.EMBEDDED);
    

    where the BaseFont.CP1252 is the encoding. Try to search for the exact encoding you need for the characters to be displayed.

    0 讨论(0)
  • 2020-12-03 15:24

    The key for proper display of alternate characters sets (Russian, Chinese, Japanese, etc.) is to use IDENTITY_H encoding when creating the BaseFont.

    Dim bfR As iTextSharp.text.pdf.BaseFont
      bfR = iTextSharp.text.pdf.BaseFont.CreateFont("MyFavoriteFont.ttf", iTextSharp.text.pdf.BaseFont.IDENTITY_H, iTextSharp.text.pdf.BaseFont.EMBEDDED)
    

    IDENTITY_H provides unicode support for your chosen font, so you should be able to display pretty much any character. I've used it for Russian, Greek, and all the different European language letters.

    EDIT - 2013-May-28

    This also works for v5.0.2 of iTextSharp.

    EDIT - 2015-June-23

    Given below is a complete code sample (in C#):

    private void CreatePdf()
    {
      string testText = "đĔĐěÇøç";
      string tmpFile = @"C:\test.pdf";
      string myFont = @"C:\<<valid path to the font you want>>\verdana.ttf";
      iTextSharp.text.Rectangle pgeSize = new iTextSharp.text.Rectangle(595, 792);
      iTextSharp.text.Document doc = new iTextSharp.text.Document(pgeSize, 10, 10, 10, 10);
      iTextSharp.text.pdf.PdfWriter wrtr;
      wrtr = iTextSharp.text.pdf.PdfWriter.GetInstance(doc,
          new System.IO.FileStream(tmpFile, System.IO.FileMode.Create));
      doc.Open();
      doc.NewPage();
      iTextSharp.text.pdf.BaseFont bfR;
      bfR = iTextSharp.text.pdf.BaseFont.CreateFont(myFont,
        iTextSharp.text.pdf.BaseFont.IDENTITY_H,
        iTextSharp.text.pdf.BaseFont.EMBEDDED);
    
      iTextSharp.text.BaseColor clrBlack = 
          new iTextSharp.text.BaseColor(0, 0, 0);
      iTextSharp.text.Font fntHead =
          new iTextSharp.text.Font(bfR, 12, iTextSharp.text.Font.NORMAL, clrBlack);
    
      iTextSharp.text.Paragraph pgr = 
          new iTextSharp.text.Paragraph(testText, fntHead);
      doc.Add(pgr);
      doc.Close();
    }
    

    This is a screenshot of the pdf file that is created:

    sample pdf

    An important point to remember is that if the font you have chosen does not support the characters you are trying to send to the pdf file, nothing you do in iTextSharp is going to change that. Verdana nicely displays the characters from all the European fonts I know of. Other fonts may not be able to display as many characters.

    0 讨论(0)
  • 2020-12-03 15:39

    There are two potential reasons characters aren't rendered:

    1. The encoding. As Stewbob pointed out, Identity-H is a great way to avoid the issue entirely, though it does require you to embed a subset of the font. This has two consequences.
      1. It increases the file size a bit over unembedded fonts.
      2. The font has to be licensed for embedded subsets. Most are, some are not.
    2. The font has to contain that character. If you ask for some Arabic ligatures out of a Cyrillic (Russian) font, chances aren't good that it'll be there. There are very few fonts that cover a variety of languages, and they tend to be HUGE. The biggest/most comprehensive font I've run into was "Arial Unicode MS". Over 23 megabytes.

    That's another good reason to require embedding SUBSETS. Tacking on a few megabytes because you wanted to add a couple Chinese glyphs is a bit steep.

    If you're feeling paranoid, you can check your strings against a given BaseFont instance (which I believe takes the encoding into account as well) with myBaseFont.charExists(someChar). If you have a font you're confident in, I wouldn't bother.

    PS: There's another good reason that Identity-H requires an embedded subset. Identity-H reads the bytes from the content stream as Glyph Indexes. The order of glyphs can vary wildly from one font to the next, or even between versions of the same font. Relying on a viewers system to have the EXACT same font is a bad idea, so its illegal... particularly when Acrobat/Reader starts substituting fonts because it couldn't find the exact font you asked for and you didn't embed it.

    0 讨论(0)
提交回复
热议问题