I have written a PowerShell
script to find and replace text within the whole document, including in the Headers and Footers as well as inside TextBoxes within the Headers.
There was lots of trial and error to get this to work and it is a bit cumbersome and probably not very efficient.
Any suggestion as to how make it better and go faster would be very much appreciated.
In particular, I'm sure there should be a better approach to get to the Headers TextBoxes but I couldn't figure it out so far.
In case it wasn't obvious, I'm not a professional coder, so please excuse the style or lacking thereof... :-)
Thanks!
$folderPath = "C:\Users\User\Folder\*" # multi-folders: "C:\fso1*", "C:\fso2*"
$fileType = "*.doc" # *.doc will take all .doc* files
$word = New-Object -ComObject Word.Application
$word.Visible = $false
Function findAndReplace($Text, $Find, $ReplaceWith) {
$matchCase = $true
$matchWholeWord = $true
$matchWildcards = $false
$matchSoundsLike = $false
$matchAllWordForms = $false
$forward = $true
$findWrap = [Microsoft.Office.Interop.Word.WdReplace]::wdReplaceAll
$format = $false
$replace = [Microsoft.Office.Interop.Word.WdFindWrap]::wdFindContinue
$Text.Execute($Find, $matchCase, $matchWholeWord, $matchWildCards, `
$matchSoundsLike, $matchAllWordForms, $forward, $findWrap, `
$format, $ReplaceWith, $replace) > $null
}
Function findAndReplaceWholeDoc($Document, $Find, $ReplaceWith) {
$findReplace = $Document.ActiveWindow.Selection.Find
findAndReplace -Text $findReplace -Find $Find -ReplaceWith $ReplaceWith
ForEach ($section in $Document.Sections) {
ForEach ($header in $section.Headers) {
$findReplace = $header.Range.Find
findAndReplace -Text $findReplace -Find $Find -ReplaceWith $ReplaceWith
$header.Shapes | ForEach-Object {
if ($_.Type -eq [Microsoft.Office.Core.msoShapeType]::msoTextBox) {
$findReplace = $_.TextFrame.TextRange.Find
findAndReplace -Text $findReplace -Find $Find -ReplaceWith $ReplaceWith
}
}
}
ForEach ($footer in $section.Footers) {
$findReplace = $footer.Range.Find
findAndReplace -Text $findReplace -Find $Find -ReplaceWith $ReplaceWith
}
}
}
Function processDoc {
$doc = $word.Documents.Open($_.FullName)
findAndReplaceWholeDoc -Document $doc -Find "THIS" -ReplaceWith "THAT"
$doc.Close([ref]$true)
}
$sw = [Diagnostics.Stopwatch]::StartNew()
$count = 0
Get-ChildItem -Path $folderPath -Recurse -Filter $fileType | ForEach-Object {
Write-Host "Processing \`"$($_.Name)\`"..."
processDoc
$count++
}
$sw.Stop()
$elapsed = $sw.Elapsed.toString()
Write-Host "`nDone. $count files processed in $elapsed"
$word.Quit()
$word = $null
[gc]::collect()
[gc]::WaitForPendingFinalizers()
5 Answers 5
COM Automation (which is what you are using) is always going to be slow. There's not much you can do about that except to try find new ways to do what you want with as few operations as possible.
An alternative you could investigate is the Open XML SDK. I've never tried it myself, but it is supposed to be a lot faster than COM Automation.
The Open XML SDK is a .NET library, so there should be no problem calling it from PowerShell, but you will have to translate the example code from C# or VB.NET into PowerShell.
Here's an example for Excel which you could adapt. Or maybe you could find an actual Word example. I didn't search very hard.
You should also check out Open-XML-PowerTools. This is a PowerShell wrapper for the Open XML SDK. Maybe it will do what you want already.
Here's a screencast that shows searching and replacing using Open-XML-PowerTools.
-
\$\begingroup\$ Thanks for the suggestion, I'd like however to stick with the current approach for now as there is no need for installing external tools and I am sure my script will be easily portable to other computers / users. \$\endgroup\$YeO– YeO2017年09月01日 09:56:03 +00:00Commented Sep 1, 2017 at 9:56
-
Shouldn't:
#region Find/Replace parameters ... $findWrap = [Microsoft.Office.Interop.Word.WdReplace]::wdReplaceAll $format = $false $replace = [Microsoft.Office.Interop.Word.WdFindWrap]::wdFindContinue #endregion
be:
#region Find/Replace parameters
...
$replace = [Microsoft.Office.Interop.Word.WdReplace]::wdReplaceAll
$format = $false
$findWrap = [Microsoft.Office.Interop.Word.WdFindWrap]::wdFindContinue
#endregion
That is - replace the variable names $replace
and $findWrap
?
@YeO, thank you for your contribution... I'm doing something similar and integrated some of your code (referencing your answer, of course).
-
\$\begingroup\$ Indeed, it totally should!! Good catch! \$\endgroup\$YeO– YeO2018年02月23日 18:27:47 +00:00Commented Feb 23, 2018 at 18:27
-
\$\begingroup\$ Actually, if counting the replacements is important, it's best to use
wdReplaceOne
for$replace
otherwise the count will be inaccurate. \$\endgroup\$YeO– YeO2018年02月23日 18:42:13 +00:00Commented Feb 23, 2018 at 18:42
Ok, here's a much better one yet. I have elected to apply multiple find and replace as I loop through the StoryRanges
of the document instead of calling my former function several times (and then loop through the StoryRanges
over and over).
I'm also now looking for the Shapes
inside Headers
and Footers
directly from the Shapes collection
and not from the StoryRanges
this works much better. We access this collection
from any Section
's Header
(or Footer
) so we simply look into the first Header
of the first Section
, hence the Sections.Item(1).Headers.Item(1)
.
Finally, rather than muting the output of the findAndReplace
, I'm counting how many times we do an actual replacement.
Hopefully someone finds this helpful, it was a great way to start using PowerShell
for me anyway.
$folderPath = "C:\Users\user\folder\*" # multi-folders: "C:\fso1*", "C:\fso2*"
$fileType = "*.doc" # *.doc will take all .doc* files
$textToReplace = @{
# "TextToFind" = "TextToReplaceWith"
"This1" = "That1"
"This2" = "That2"
"This3" = "That3"
}
$word = New-Object -ComObject Word.Application
$word.Visible = $false
#region Find/Replace parameters
$matchCase = $true
$matchWholeWord = $true
$matchWildcards = $false
$matchSoundsLike = $false
$matchAllWordForms = $false
$forward = $true
$findWrap = [Microsoft.Office.Interop.Word.WdFindWrap]::wdFindContinue
$format = $false
$replace = [Microsoft.Office.Interop.Word.WdReplace]::wdReplaceOne
#endregion
$countf = 0 #count files
$countr = 0 #count replacements per file
$counta = 0 #count all replacements
Function findAndReplace($objFind, $FindText, $ReplaceWith) {
#simple Find and Replace to execute on a Find object
#we let the function return (True/False) to count the replacements
$objFind.Execute($FindText, $matchCase, $matchWholeWord, $matchWildCards, $matchSoundsLike, $matchAllWordForms, \`
$forward, $findWrap, $format, $ReplaceWith, $replace) #> $null
}
Function findAndReplaceAll($objFind, $FindText, $ReplaceWith) {
#make sure we replace all occurrences (while we find a match)
$count = 0
$count += findAndReplace $objFind $FindText $ReplaceWith
While ($objFind.Found) {
$count += findAndReplace $objFind $FindText $ReplaceWith
}
return $count
}
Function findAndReplaceMultiple($objFind, $lookupTable) {
#apply multiple Find and Replace on the same Find object
$count = 0
$lookupTable.GetEnumerator() | ForEach-Object {
$count += findAndReplaceAll $objFind $_.Key $_.Value
}
return $count
}
Function findAndReplaceWholeDoc($Document, $lookupTable) {
$count = 0
# Loop through each StoryRange
ForEach ($storyRge in $Document.StoryRanges) {
Do {
$count += findAndReplaceMultiple $storyRge.Find $lookupTable
#check for linked Ranges
$storyRge = $storyRge.NextStoryRange
} Until (!$storyRge) #null is False
}
#region Loop through Shapes within Headers and Footers
# https://msdn.microsoft.com/en-us/vba/word-vba/articles/shapes-object-word
# "The Count property for this collection in a document returns the number of items in the main story only.
# To count the shapes in all the headers and footers, use the Shapes collection with any HeaderFooter object."
# Hence the .Sections.Item(1).Headers.Item(1) which should be able to collect all Shapes, without the need
# for looping through each Section.
#endregion
$shapes = $Document.Sections.Item(1).Headers.Item(1).Shapes
If ($shapes.Count) {
#ForEach ($shape in $shapes | Where {$_.TextFrame.HasText -eq -1}) {
ForEach ($shape in $shapes | Where {[bool]$_.TextFrame.HasText}) {
#Write-Host $($shape.TextFrame.HasText)
$count += findAndReplaceMultiple $shape.TextFrame.TextRange.Find $lookupTable
}
}
return $count
}
Function processDoc {
$doc = $word.Documents.Open($_.FullName)
$count = findAndReplaceWholeDoc $doc $textToReplace
$doc.Close([ref]$true)
return $count
}
$sw = [Diagnostics.Stopwatch]::StartNew()
Get-ChildItem -Path $folderPath -Recurse -Filter $fileType | ForEach-Object {
Write-Host "Processing \`"$($_.Name)\`"..."
$countr = processDoc
Write-Host "$countr replacements made."
$counta += $countr
$countf++
}
$sw.Stop()
$elapsed = $sw.Elapsed.toString()
Write-Host "`nDone. $countf files processed in $elapsed"
Write-Host "$counta replacements made in total."
$word.Quit()
$word = $null
[gc]::collect()
[gc]::WaitForPendingFinalizers()
Nice one. In response to your question 'a better approach to get to the Headers TextBoxes'... this is what we are using .. .less nested loops
# In Word Doc - Loop through each StoryRange and perform a find and replace
# This includes text and shapes in the body but only text in the headers & footers
ForEach ($storyRng in $newWordDoc.StoryRanges) {
$success = $storyRng.Find.Execute($findText,$matchCase,$matchWholeWord,$matchWildcards,$matchSoundsLike,$matchAllWordForms,$forward,$wrap,$format,$replaceWith,$replace)
}
# In Word Doc - Loop through all Shapes within all Headers and Footers and perform a find and replace
$shapes = $newWordDoc.Sections.Item(1).Headers.Item(1).Shapes
If ($shapes.Count) {
ForEach ($shape in $shapes | Where {[bool]$_.TextFrame.HasText}) {
$success = $shape.TextFrame.TextRange.Find.Execute($findText,$matchCase,$matchWholeWord,$matchWildcards,$matchSoundsLike,$matchAllWordForms,$forward,$wrap,$format,$replaceWith,$replace)
}
}
Description of below full code:
- Copies a word template (well *.docx with mail merge looking text, eg: '[full_name], not a *.dotx)
- You provide it a hashtable of search and replace string pairs (see main body at the bottom)
- For each search and replace string pair, it replaces ALL occurances of the search string (eg: '[full_name]') in the doc with the replace string (eg: 'Jane Doe') in ALL sections of the doc, incl body, header, footer, shapes
- saves the word doc copy
- Saves as PDF
- Deletes the word doc copy
# -------------------------------------------------------------
# FUNCTION: Create-PdfFromWordTemplate (*.docx -> *.pdf)
# -------------------------------------------------------------
# Create a PDF file based on a copy of a Word Template
# after the equivalent of a mail merge (search and replace).
function Create-PdfFromWordTemplate {
[CmdletBinding()]
Param(
[Parameter(Mandatory=$True)]
[ValidateScript({Test-Path $_})]
[string]$FilePath,
[Parameter(Mandatory=$True)]
[string]$FileNameExclExt,
[Parameter(Mandatory=$True)]
[ValidateScript({Test-Path $_})]
[string]$TemplateFilePath,
[Parameter(Mandatory=$True)]
[string]$TemplateFileName,
[Parameter(Mandatory=$True)]
[hashtable]$SearchAndReplacePairs
)
Begin {
# Create a Microsoft Word Application object
try {
$wordApp = New-Object -ComObject Word.Application
$wordApp.Visible = $false
}
catch {
Write-Warning "$(Get-Date): error in Create-PdfFromWordTemplate create com object: $($_)"
return
}
# Variables - Word Application Enumerations
# https://docs.microsoft.com/en-us/office/vba/api/word.find.execute
$wdFindContinue = 1 # https://docs.microsoft.com/en-us/office/vba/api/word.wdfindwrap
$wdReplaceAll = 2 # https://docs.microsoft.com/en-us/office/vba/api/word.wdreplace
$wdFormatPDF = 17 # https://docs.microsoft.com/en-us/office/vba/api/word.wdsaveformat
$matchCase = $false
$matchWholeWord = $false
$matchWildcards = $false
$matchSoundsLike = $false
$matchAllWordForms = $false
$forward = $true
$wrap = $wdFindContinue
$format = $false
$replace = $wdReplaceAll
}
Process {
# Filenames
$fullTemplFileName = Join-Path -Path $TemplateFilePath -ChildPath "$($TemplateFileName)"
$fullWordFileName = Join-Path -Path $TemplateFilePath -ChildPath "$($FileNameExclExt).docx"
$fullPdfFileName = Join-Path -Path $FilePath -ChildPath "$($FileNameExclExt).pdf"
# Copy Word Template to a New Temporary Word doc
try {
Copy-Item -Path "$($fullTemplFileName)" "$($fullWordFileName)"
}
catch {
Write-Warning "$(Get-Date): error in Create-PdfFromWordTemplate Copy-Item: $($_)"
return
}
# Open Word Doc
try {
$newWordDoc = $wordApp.Documents.Open( $fullWordFileName )
# Loop through each Search and Replace Pair
# Search and replace in the temp word doc
ForEach ($key in $SearchAndReplacePairs.keys) {
Write-Verbose "$(Get-DateTime): Replacing $($key) with $($SearchAndReplacePairs[$key]) in $($fullWordFileName)"
# Set Search & Replace values
$findText = $key
$replaceWith = $SearchAndReplacePairs[$key]
# In Word Doc - Loop through each StoryRange and perform a find and replace
# This includes text and shapes in the body but only text in the headers & footers
ForEach ($storyRng in $newWordDoc.StoryRanges) {
$success = $storyRng.Find.Execute($findText,$matchCase,$matchWholeWord,$matchWildcards,$matchSoundsLike,$matchAllWordForms,$forward,$wrap,$format,$replaceWith,$replace)
}
# In Word Doc - Loop through all Shapes within all Headers and Footers and perform a find and replace
$shapes = $newWordDoc.Sections.Item(1).Headers.Item(1).Shapes
If ($shapes.Count) {
ForEach ($shape in $shapes | Where {[bool]$_.TextFrame.HasText}) {
$success = $shape.TextFrame.TextRange.Find.Execute($findText,$matchCase,$matchWholeWord,$matchWildcards,$matchSoundsLike,$matchAllWordForms,$forward,$wrap,$format,$replaceWith,$replace)
}
}
}
# Save the word document
$newWordDoc.Save()
# Save the word document as a PDF document
Write-Verbose "$(Get-DateTime): Converting: $($fullWordFileName) to $($fullPdfFileName)"
$newWordDoc.SaveAs([ref] "$fullPdfFileName", [ref] $wdFormatPDF)
# Close the word document
$newWordDoc.Close()
# Remove the temporary word document
Remove-Item -Path $fullWordFileName
}
catch {
Write-Warning "$(Get-Date): error in Create-PdfFromWordTemplate: $($_)"
}
}
End {
# Exit the Word Application & Cleanup
try {
$wordApp.Quit()
$null = [System.Runtime.Interopservices.Marshal]::ReleaseComObject($wordApp)
[System.GC]::Collect()
[System.GC]::WaitForPendingFinalizers()
}
catch {
Write-Warning "$(Get-Date): error in Create-PdfFromWordTemplate end: $($_)"
}
}
}
# -------------------------------------------------------------
# MAIN BODY
# -------------------------------------------------------------
# Prepare the list of search and replace string/value pairs
$searchReplacePairs = @{
"[full_name]" = "Jane Doe"
"[dob]" = "01/01/2000"
}
Create-PdfFromWordTemplate -FilePath "C:\letters" -FileNameExclExt "Letter" -TemplateFilePath "C:\templates" -TemplateFileName "SampleTemplate.docx" -SearchAndReplacePairs $searchReplacePairs
I also came across this but haven't used it yet. It may help: http://blog.beyondimpactllc.com/blog/garbage-collection-with-powershell
# Adding the garbage collection (GC) line at the start of each for/foreach loop helps improve memory intensive scripts
foreach ($server in $servers) {
[system.gc]::Collect()
$events = get-winevent | where filterItSomehowToReduceSizeOfStoredObject
$events | export-csv $outputFilePath
}
-
\$\begingroup\$ Welcome to Code Review! While this appears to have helped the OP, it is an alternative solution but not really a review of the OP's code. Please explain your reasoning (how your solution works and why it is better than the original) so that the author and other readers can learn from your thought process. Please read Why are alternative solutions not welcome? \$\endgroup\$Stephen Rauch– Stephen Rauch2021年10月22日 15:21:14 +00:00Commented Oct 22, 2021 at 15:21
-
\$\begingroup\$ Apoligies, but I thought I was helping answer how to more efficiently loop through body, header & footer which is exactly what the OP was asking, or maybe I'm mistaken? \$\endgroup\$Shell D– Shell D2021年10月23日 09:59:59 +00:00Commented Oct 23, 2021 at 9:59
Perhaps not yet optimized, but here's where I am so far. I believe this approach is better, although maybe not much faster. At least, it should not leave out any Shape
that contains text, whether in Headers
or Footers
and it only has 2 levels of nested ForEach
.
Inspiration came from (and credits should go to) this page.
$storyTypes = [Microsoft.Office.Interop.Word.WdStoryType]
Function findAndReplaceWholeDoc($Document, $FindText, $ReplaceWith) {
ForEach ($storyRge in $Document.StoryRanges) {
Do {
findAndReplace -objFind $storyRge.Find -FindText $FindText -ReplaceWith $ReplaceWith
If (($storyRge.StoryType -ge $storyTypes::wdEvenPagesHeaderStory) -and \`
($storyRge.StoryType -le $storyTypes::wdFirstPageFooterStory)) {
If ($storyRge.ShapeRange.Count -gt 0) {
ForEach ($shp in $storyRge.ShapeRange) {
If ($shp.TextFrame.HasText -eq -1) {
$obj = $shp.TextFrame.TextRange.Find
findAndReplace -objFind $obj -FindText $FindText -ReplaceWith $ReplaceWith
}
}
}
}
$storyRge = $storyRge.NextStoryRange
} Until ($storyRge -eq $null)
}
}
-
\$\begingroup\$ This approach isn't as efficient as the 'Sections.Item(1).Headers.Item(1)' one... \$\endgroup\$YeO– YeO2018年02月23日 18:46:36 +00:00Commented Feb 23, 2018 at 18:46
$Document.ActiveWindow.Selection.Find
for the sake of reducing horizontal scrolling in the question? Continuation in a new line in powershell usually requires a`
backtick... \$\endgroup\$*.doc will take all .doc* files
puzzles me. \$\endgroup\$