JavaScript library to read doc and docx on client

前端 未结 2 1016
失恋的感觉
失恋的感觉 2021-01-20 22:05

I am searching for a JavaScript library, which can read .doc - and .docx - files. The focus is only on the text content. I am not interested in pic

相关标签:
2条回答
  • 2021-01-20 22:49

    You can use docxtemplater for this (even if normally, it is used for templating, it can also just get the text of the document) :

    var zip = new JSZip(content);
    var doc=new Docxtemplater().loadZip(zip)
    var text= doc.getFullText();
    console.log(text);
    

    See the Doc for installation information (I'm the maintainer of this project)

    However, it only handles docx, not doc

    0 讨论(0)
  • 2021-01-20 22:53

    now you can extract the text content from doc/docx without installing external dependencies.

    You can use the node library called any-text

    Currently, it supports a number of file extensions like PDF, XLSX, XLS, CSV etc

    Usage is very simple:

    • Install the library as a dependency (/dev-dependency)
    npm i -D any-text
    
    • Make use of the getText method to read the text content
    var reader = require('any-text');
    
    reader.getText(`path-to-file`).then(function (data) {
      console.log(data);
    });
    
    • You can also use the async/await notation
    var reader = require('any-text');
    
    const text = await reader.getText(`path-to-file`);
    
    console.log(text);
    

    Sample Test

    var reader = require('any-text');
    
    const chai = require('chai');
    const expect = chai.expect;
    
    describe('file reader checks', () => {
      it('check docx file content', async () => {
        expect(
          await reader.getText(`${process.cwd()}/test/files/dummy.doc`)
        ).to.contains('Lorem ipsum');
      });
    });
    

    I hope it will help!

    0 讨论(0)
提交回复
热议问题