How to correctly extract text from a pdf using pdf.js

后端未结

关注

 4  1675

生来不讨喜 2020-12-15 10:30

I\'m new to ES6 and Promise. I\'m trying pdf.js to extract texts from all pages of a pdf file into a string array. And when extraction is done, I want to parse the array som

4条回答

有刺的猬 (楼主)

2020-12-15 10:49

If you use the PDFViewer component, here is my solution that doesn't involve any promise or asynchrony:

function getDocumentText(viewer) {
    let text = '';
    for (let i = 0; i < viewer.pagesCount; i++) {
        const { textContentItemsStr } = viewer.getPageView(i).textLayer;
        for (let item of textContentItemsStr)
            text += item;
    }
    return text;
}

0 讨论(0)

查看其它4个回答