Reading/Writing a MS Word file in PHP

前端 未结 16 1401
借酒劲吻你
借酒劲吻你 2020-11-22 14:35

Is it possible to read and write Word (2003 and 2007) files in PHP without using a COM object? I know that I can:

$file = fopen(\'c:\\file.doc\', \'w+\');
fw         


        
相关标签:
16条回答
  • 2020-11-22 14:38

    2007 might be a bit complicated as well.

    The .docx format is a zip file that contains a few folders with other files in them for formatting and other stuff.

    Rename a .docx file to .zip and you'll see what I mean.

    So if you can work within zip files in PHP, you should be on the right path.

    0 讨论(0)
  • 2020-11-22 14:39

    Source gotten from

    Use following class directly to read word document

    class DocxConversion{
        private $filename;
    
        public function __construct($filePath) {
            $this->filename = $filePath;
        }
    
        private function read_doc() {
            $fileHandle = fopen($this->filename, "r");
            $line = @fread($fileHandle, filesize($this->filename));   
            $lines = explode(chr(0x0D),$line);
            $outtext = "";
            foreach($lines as $thisline)
              {
                $pos = strpos($thisline, chr(0x00));
                if (($pos !== FALSE)||(strlen($thisline)==0))
                  {
                  } else {
                    $outtext .= $thisline." ";
                  }
              }
             $outtext = preg_replace("/[^a-zA-Z0-9\s\,\.\-\n\r\t@\/\_\(\)]/","",$outtext);
            return $outtext;
        }
    
        private function read_docx(){
    
            $striped_content = '';
            $content = '';
    
            $zip = zip_open($this->filename);
    
            if (!$zip || is_numeric($zip)) return false;
    
            while ($zip_entry = zip_read($zip)) {
    
                if (zip_entry_open($zip, $zip_entry) == FALSE) continue;
    
                if (zip_entry_name($zip_entry) != "word/document.xml") continue;
    
                $content .= zip_entry_read($zip_entry, zip_entry_filesize($zip_entry));
    
                zip_entry_close($zip_entry);
            }// end while
    
            zip_close($zip);
    
            $content = str_replace('</w:r></w:p></w:tc><w:tc>', " ", $content);
            $content = str_replace('</w:r></w:p>', "\r\n", $content);
            $striped_content = strip_tags($content);
    
            return $striped_content;
        }
    
     /************************excel sheet************************************/
    
    function xlsx_to_text($input_file){
        $xml_filename = "xl/sharedStrings.xml"; //content file name
        $zip_handle = new ZipArchive;
        $output_text = "";
        if(true === $zip_handle->open($input_file)){
            if(($xml_index = $zip_handle->locateName($xml_filename)) !== false){
                $xml_datas = $zip_handle->getFromIndex($xml_index);
                $xml_handle = DOMDocument::loadXML($xml_datas, LIBXML_NOENT | LIBXML_XINCLUDE | LIBXML_NOERROR | LIBXML_NOWARNING);
                $output_text = strip_tags($xml_handle->saveXML());
            }else{
                $output_text .="";
            }
            $zip_handle->close();
        }else{
        $output_text .="";
        }
        return $output_text;
    }
    
    /*************************power point files*****************************/
    function pptx_to_text($input_file){
        $zip_handle = new ZipArchive;
        $output_text = "";
        if(true === $zip_handle->open($input_file)){
            $slide_number = 1; //loop through slide files
            while(($xml_index = $zip_handle->locateName("ppt/slides/slide".$slide_number.".xml")) !== false){
                $xml_datas = $zip_handle->getFromIndex($xml_index);
                $xml_handle = DOMDocument::loadXML($xml_datas, LIBXML_NOENT | LIBXML_XINCLUDE | LIBXML_NOERROR | LIBXML_NOWARNING);
                $output_text .= strip_tags($xml_handle->saveXML());
                $slide_number++;
            }
            if($slide_number == 1){
                $output_text .="";
            }
            $zip_handle->close();
        }else{
        $output_text .="";
        }
        return $output_text;
    }
    
    
        public function convertToText() {
    
            if(isset($this->filename) && !file_exists($this->filename)) {
                return "File Not exists";
            }
    
            $fileArray = pathinfo($this->filename);
            $file_ext  = $fileArray['extension'];
            if($file_ext == "doc" || $file_ext == "docx" || $file_ext == "xlsx" || $file_ext == "pptx")
            {
                if($file_ext == "doc") {
                    return $this->read_doc();
                } elseif($file_ext == "docx") {
                    return $this->read_docx();
                } elseif($file_ext == "xlsx") {
                    return $this->xlsx_to_text();
                }elseif($file_ext == "pptx") {
                    return $this->pptx_to_text();
                }
            } else {
                return "Invalid File Type";
            }
        }
    
    }
    
    $docObj = new DocxConversion("test.docx"); //replace your document name with correct extension doc or docx 
    echo $docText= $docObj->convertToText();
    
    0 讨论(0)
  • 2020-11-22 14:40

    I have the same case I guess I am going to use a cheap 50 mega windows based hosting with free domain to use it to convert my files on, for PHP server. And linking them is easy. All you need is make an ASP.NET page that recieves the doc file via post and replies it via HTTP so simple CURL would do it.

    0 讨论(0)
  • 2020-11-22 14:43

    You can use Antiword, it is a free MS Word reader for Linux and most popular OS.

    $document_file = 'c:\file.doc';
    $text_from_doc = shell_exec('/usr/local/bin/antiword '.$document_file);
    
    0 讨论(0)
  • 2020-11-22 14:43

    I don't know about reading native Word documents in PHP, but if you want to write a Word document in PHP, WordprocessingML (aka WordML) might be a good solution. All you have to do is create an XML document in the correct format. I believe Word 2003 and 2007 both support WordML.

    0 讨论(0)
  • 2020-11-22 14:45

    One way to manipulate Word files with PHP that you may find interesting is with the help of PHPDocX. You may see how it works having a look at its online tutorial. You can insert or extract contents or even merge multiple Word files into a asingle one.

    0 讨论(0)
提交回复
热议问题