发表新帖

发表新帖

Read Japanese characters in a PDF file

后端未结

关注

 2  754

I have the following command:

[<0e0f0a52030d030e0ce5030f0744030f>10<030d>10<0cd4>]TJ

I know that it hides Japanese in the Hex secti

相关标签:

2条回答

深忆病人

2021-01-07 11:31
Since most thoughts here are fundamentally correct, they are not complete and not exact, so:
- The /ToUnicode MAY be present in the PDF file, but is not a must!!!
- There are external, predetermined/predefined CMaps for multiple languages, here.
It was pretty frustrating to dig so long in the wrong place, I've tared the PDF into tiny pieces and have went through all the streams in the file, to find this map without luck, because it WAS NOT IN THE FILE!

I hope this save someone else the hassle...
0 讨论(0)
发布评论:

提交评论
- 加载中...
生来不讨喜

2021-01-07 11:41

Here is your problem:

I figured out that there is an extra "encryption", Identity-H, and I've read here that you need a /ToUnicode map which I cannot seem to find in the file.

That indicates the two-byte hex codes in your text strings are immediate glyph indexes into the original font file. Search the font file for a Unicode character map (one of its cmap entries); this will provide the link from glyph index to Unicode.

Note that it's possible that a glyph index does not translate immediately to a Unicode codepoint. A GSUB or GPOS OpenType table may have taken one or more Unicode characters as input and substituted them with another glyph in the output string. It's also possible (but less likely) the original creator inserted raw glyphs manually.

0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题