JavaScript library to read doc and docx on client

前端未结

关注

 2  1018

I am searching for a JavaScript library, which can read .doc - and .docx - files. The focus is only on the text content. I am not interested in pic

相关标签:

2条回答

我在风中等你

2021-01-20 22:49
You can use docxtemplater for this (even if normally, it is used for templating, it can also just get the text of the document) :
```
var zip = new JSZip(content);
var doc=new Docxtemplater().loadZip(zip)
var text= doc.getFullText();
console.log(text);
```
See the Doc for installation information (I'm the maintainer of this project)

However, it only handles docx, not doc
0 讨论(0)
发布评论:

提交评论
- 加载中...

醉话见心

2021-01-20 22:53

now you can extract the text content from doc/docx without installing external dependencies.

You can use the node library called any-text

Currently, it supports a number of file extensions like PDF, XLSX, XLS, CSV etc

Usage is very simple:

Install the library as a dependency (/dev-dependency)

npm i -D any-text

Make use of the getText method to read the text content

var reader = require('any-text');

reader.getText(`path-to-file`).then(function (data) {
  console.log(data);
});

You can also use the async/await notation

var reader = require('any-text');

const text = await reader.getText(`path-to-file`);

console.log(text);

Sample Test

var reader = require('any-text');

const chai = require('chai');
const expect = chai.expect;

describe('file reader checks', () => {
  it('check docx file content', async () => {
    expect(
      await reader.getText(`${process.cwd()}/test/files/dummy.doc`)
    ).to.contains('Lorem ipsum');
  });
});

I hope it will help!

0 讨论(0)