Reading/Writing a MS Word file in PHP

前端 未结 16 1403
借酒劲吻你
借酒劲吻你 2020-11-22 14:35

Is it possible to read and write Word (2003 and 2007) files in PHP without using a COM object? I know that I can:

$file = fopen(\'c:\\file.doc\', \'w+\');
fw         


        
相关标签:
16条回答
  • 2020-11-22 14:58

    this works with vs < office 2007 and its pure PHP, no COM crap, still trying to figure 2007

    <?php
    
    
    
    /*****************************************************************
    This approach uses detection of NUL (chr(00)) and end line (chr(13))
    to decide where the text is:
    - divide the file contents up by chr(13)
    - reject any slices containing a NUL
    - stitch the rest together again
    - clean up with a regular expression
    *****************************************************************/
    
    function parseWord($userDoc) 
    {
        $fileHandle = fopen($userDoc, "r");
        $line = @fread($fileHandle, filesize($userDoc));   
        $lines = explode(chr(0x0D),$line);
        $outtext = "";
        foreach($lines as $thisline)
          {
            $pos = strpos($thisline, chr(0x00));
            if (($pos !== FALSE)||(strlen($thisline)==0))
              {
              } else {
                $outtext .= $thisline." ";
              }
          }
         $outtext = preg_replace("/[^a-zA-Z0-9\s\,\.\-\n\r\t@\/\_\(\)]/","",$outtext);
        return $outtext;
    } 
    
    $userDoc = "cv.doc";
    
    $text = parseWord($userDoc);
    echo $text;
    
    
    ?>
    
    0 讨论(0)
  • 2020-11-22 15:00

    even i'm working on same kind of project [An Onlinw Word Processor]! But i've choosen c#.net and ASP.net. But through the survey i did; i got to know that

    By Using Open XML SDK and VSTO [Visual Studio Tools For Office]

    we may easily work with a word file manipulate them and even convert internally to different into several formats such as .odt,.pdf,.docx etc..

    So, goto msdn.microsoft.com and be thorough about the office development tab. Its the easiest way to do this as all functions we need to implement are already available in .net!!

    But as u want to do ur project in PHP, u can do it in Visual Studio and .net as PHP is also one of the .net Compliant Language!!

    0 讨论(0)
  • 2020-11-22 15:02

    Just updating the code

    <?php
    
    /*****************************************************************
    This approach uses detection of NUL (chr(00)) and end line (chr(13))
    to decide where the text is:
    - divide the file contents up by chr(13)
    - reject any slices containing a NUL
    - stitch the rest together again
    - clean up with a regular expression
    *****************************************************************/
    
    function parseWord($userDoc) 
    {
        $fileHandle = fopen($userDoc, "r");
        $word_text = @fread($fileHandle, filesize($userDoc));
        $line = "";
        $tam = filesize($userDoc);
        $nulos = 0;
        $caracteres = 0;
        for($i=1536; $i<$tam; $i++)
        {
            $line .= $word_text[$i];
    
            if( $word_text[$i] == 0)
            {
                $nulos++;
            }
            else
            {
                $nulos=0;
                $caracteres++;
            }
    
            if( $nulos>1996)
            {   
                break;  
            }
        }
    
        //echo $caracteres;
    
        $lines = explode(chr(0x0D),$line);
        //$outtext = "<pre>";
    
        $outtext = "";
        foreach($lines as $thisline)
        {
            $tam = strlen($thisline);
            if( !$tam )
            {
                continue;
            }
    
            $new_line = ""; 
            for($i=0; $i<$tam; $i++)
            {
                $onechar = $thisline[$i];
                if( $onechar > chr(240) )
                {
                    continue;
                }
    
                if( $onechar >= chr(0x20) )
                {
                    $caracteres++;
                    $new_line .= $onechar;
                }
    
                if( $onechar == chr(0x14) )
                {
                    $new_line .= "</a>";
                }
    
                if( $onechar == chr(0x07) )
                {
                    $new_line .= "\t";
                    if( isset($thisline[$i+1]) )
                    {
                        if( $thisline[$i+1] == chr(0x07) )
                        {
                            $new_line .= "\n";
                        }
                    }
                }
            }
            //troca por hiperlink
            $new_line = str_replace("HYPERLINK" ,"<a href=",$new_line); 
            $new_line = str_replace("\o" ,">",$new_line); 
            $new_line .= "\n";
    
            //link de imagens
            $new_line = str_replace("INCLUDEPICTURE" ,"<br><img src=",$new_line); 
            $new_line = str_replace("\*" ,"><br>",$new_line); 
            $new_line = str_replace("MERGEFORMATINET" ,"",$new_line); 
    
    
            $outtext .= nl2br($new_line);
        }
    
     return $outtext;
    } 
    
    $userDoc = "custo.doc";
    $userDoc = "Cultura.doc";
    $text = parseWord($userDoc);
    
    echo $text;
    
    
    ?>
    
    0 讨论(0)
  • 2020-11-22 15:03

    Office 2007 .docx should be possible since it's an XML standard. Word 2003 most likely requires COM to read, even with the standards now published by MS, since those standards are huge. I haven't seen many libraries written to match them yet.

    0 讨论(0)
提交回复
热议问题