I\'m using the following powershell script to open a few thousand HTML files and \"save as...\" Word documents.
param([string]$htmpath,[string]$docpath = $d
I know this is an older post but I am posting this code here so that I can find it in the future
**
**
Here is a LINK to the diffierent formats you can save to.
$docpath = "C:\Temp"
$WdTypes = Add-Type -AssemblyName 'Microsoft.Office.Interop.Word, Version=14.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c' -Passthru
$srcfiles = get-childitem $docpath -filter "*.doc" -rec | where {!$_.PSIsContainer} | select-object FullName
$saveFormat = $WdTypes | Where {$_.Name -eq 'WdSaveFormat'}
$word = new-object -comobject word.application
$word.Visible = $False
function saveas-filteredhtml
{
$opendoc = $word.documents.open($doc.FullName);
$Name=($doc.Fullname).replace(".docx",".txt").replace(".doc",".txt")
$opendoc.saveas([ref]$Name, [ref]$saveFormat::wdFormatDOSText); ##wdFormatDocument
$opendoc.close();
}
ForEach ($doc in $srcfiles)
{
Write-Host "Processing :" $doc.FullName
saveas-filteredhtml
$doc = $null
}
$word.quit();
$docpath = "\\sf-xyz-serverabc01\ChangeTheseDocuments"
$WdTypes = Add-Type -AssemblyName 'Microsoft.Office.Interop.Word, Version=14.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c' -Passthru
$srcfiles = get-childitem $docpath -filter "*.doc" -rec | where {!$_.PSIsContainer} | select-object FullName
$saveFormat = $WdTypes | Where {$_.Name -eq 'WdSaveFormat'}
$word = new-object -comobject word.application
$word.Visible = $False
function saveas-filteredhtml
{
$opendoc = $word.documents.open($doc.FullName);
$Name=($doc.Fullname).replace("doc","htm")
$opendoc.saveas([ref]$Name, [ref]$saveFormat::wdFormatFilteredHTML);
$opendoc.close();
}
ForEach ($doc in $srcfiles)
{
Write-Host "Processing :" $doc.FullName
saveas-filteredhtml
$doc = $null
}
$word.quit();
Based on our comments above, it seems that moving the files is all you want to accomplish. The following works for me. In the current directory, it replaces .txt extensions with .py extensions. I found the command here.
PS C:\testing dir *.txt | Move-Item -Destination {[IO.Path]::ChangeExtension( $_.Name, "py")}
You can also change *.txt
to C:\path\to\file\*.txt
so you don't need to execute this line from the location of the files. You should be able to define a destination in a similar manner, so I'll report back if I find a simple way to do that.
Also, I found Microsoft's TechNet Library while I was searching. It has many tutorials on scripting using PowerShell. Files and Folders, Part 3: Windows PowerShell should help you to find additional info on copying and moving files.
I was having problems just converting the filename from .html
to .docx
. I took your code above and changed it to this:
function Convert-HTMLtoDocx {
param([string]$htmpath)
$srcfiles = Get-ChildItem $htmPath -filter "*.htm*"
$saveFormat = [Microsoft.Office.Interop.Word.WdSaveFormat]::wdFormatXMLDocument
$word = new-object -comobject word.application
$word.Visible = $False
ForEach ($doc in $srcfiles) {
Write-Host "Processing :" $doc.fullname
$name = Join-Path -Path $doc.DirectoryName -ChildPath $($doc.BaseName + ".docx")
$opendoc = $word.documents.open($doc.FullName)
$opendoc.saveas([ref]$name.Value,[ref]$saveFormat)
$opendoc.close()
$doc = $null
} #End ForEach
$word.quit()
} #End Function
The problem was the save format. For whatever reason, so save a document as a .docx
you need to specify the format at wdFormatXMLDocument
not wdFormatDocument
.