PDF Parsing with SWIFT

后端未结

关注

 2  1715

无人及你 2021-02-07 13:32

I want to parse a PDF that has no images, only text. I\'m trying to find pieces of text. For example to search the string \"Name:\" and be able to read the characters after \":\

2条回答

旧巷少年郎 (楼主)

2021-02-07 14:05

This is a pretty intensive task. There are libs like PDFKitten which are not maintained anymore. Here is a port of PDFKitten to swift that i did, with some modifications to the way the string searching / content indexing is done, as well as support for truetype fonts.

https://github.com/SimpleApp/PDFParser

[disclaimer : lib author]

[second disclaimer: this lib is 100% mit open sourced. The library has nothing to do with the company, it's not an ad or even a product, i'm posting this comment to help people, and then maybe grow a community around it, because it's a very common requirement and nothing free works well enough]

EDIT : the reason it's a pretty intensive task (not to mention all the character encoding issues), is that the PDF format doesn't have the notion of a "line of text" or even a "word". All it has is character printing instruction. Which means that if you want to find a "word", you'll have to recompute the frame of every blocks of character, using font information, and find the ones can be coalesced into a single word.

That's the reason why you won't find a lot of libraries doing those kind of features, and even some big project fail sometimes at providing correct copy/paste or text search features.

0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...