Convert Word doc or docx files into text files?

前端 未结 11 477
难免孤独
难免孤独 2020-12-05 01:28

I need a way to convert .doc or .docx extensions to .txt without installing anything. I also don\'t want to have to manually open Wor

相关标签:
11条回答
  • 2020-12-05 01:53

    You can't do it in VBA if you don't want to start Word (or another Office application). Even if you meant VB, you'd still have to start a (hidden) instance of Word to do the processing.

    0 讨论(0)
  • 2020-12-05 01:54

    If you have some flavour of unix installed, you can use the 'strings' utility to find and extract all readable strings from the document. There will be some mess before and after the text you are looking for, but the results will be readable.

    0 讨论(0)
  • 2020-12-05 01:54

    I need a way to convert .doc or .docx extensions to .txt without installing anything

    for I in *.doc?; do mv $I `echo $ | sed 's/\.docx?/\.txt'`; done
    

    Just joking.

    You could use antiword for the older versions of Word documents, and try to parse the xml of the new ones.

    0 讨论(0)
  • 2020-12-05 01:59

    Note that an excellent source of information for Microsoft Office applications is the Object Browser. You can access it via ToolsMacroVisual Basic Editor. Once you are in the editor, hit F2 to browse the interfaces, methods, and properties provided by Microsoft Office applications.

    Here is an example using Win32::OLE:

    #!/usr/bin/perl
    
    use strict;
    use warnings;
    
    use File::Spec::Functions qw( catfile );
    
    use Win32::OLE;
    use Win32::OLE::Const 'Microsoft Word';
    $Win32::OLE::Warn = 3;
    
    my $word = get_word();
    $word->{Visible} = 0;
    
    my $doc = $word->{Documents}->Open(catfile $ENV{TEMP}, 'test.docx');
    
    $doc->SaveAs(
        catfile($ENV{TEMP}, 'test.txt'),
        wdFormatTextLineBreaks
    );
    
    $doc->Close(0);
    
    sub get_word {
        my $word;
        eval {
            $word = Win32::OLE->GetActiveObject('Word.Application');
        };
    
        die "$@\n" if $@;
    
        unless(defined $word) {
            $word = Win32::OLE->new('Word.Application', sub { $_[0]->Quit })
                or die "Oops, cannot start Word: ",
                       Win32::OLE->LastError, "\n";
        }
        return $word;
    }
    __END__
    
    0 讨论(0)
  • 2020-12-05 02:02

    With docxtemplater, you can easily get the full text of a word (works with docx only).

    Here's the code (Node.JS)

    DocxTemplater=require('docxtemplater');
    doc=new DocxTemplater().loadFromFile("input.docx");
    result=doc.getFullText();
    

    This is just three lines of code and doesn't depend on any word instance (all plain JS)

    0 讨论(0)
提交回复
热议问题