9
\$\begingroup\$

I have written a PowerShell script to find and replace text within the whole document, including in the Headers and Footers as well as inside TextBoxes within the Headers.
There was lots of trial and error to get this to work and it is a bit cumbersome and probably not very efficient.
Any suggestion as to how make it better and go faster would be very much appreciated.
In particular, I'm sure there should be a better approach to get to the Headers TextBoxes but I couldn't figure it out so far.
In case it wasn't obvious, I'm not a professional coder, so please excuse the style or lacking thereof... :-)
Thanks!

$folderPath = "C:\Users\User\Folder\*" # multi-folders: "C:\fso1*", "C:\fso2*"
$fileType = "*.doc" # *.doc will take all .doc* files
$word = New-Object -ComObject Word.Application
$word.Visible = $false
Function findAndReplace($Text, $Find, $ReplaceWith) {
 $matchCase = $true
 $matchWholeWord = $true
 $matchWildcards = $false
 $matchSoundsLike = $false
 $matchAllWordForms = $false
 $forward = $true
 $findWrap = [Microsoft.Office.Interop.Word.WdReplace]::wdReplaceAll
 $format = $false
 $replace = [Microsoft.Office.Interop.Word.WdFindWrap]::wdFindContinue
 $Text.Execute($Find, $matchCase, $matchWholeWord, $matchWildCards, ` 
 $matchSoundsLike, $matchAllWordForms, $forward, $findWrap, ` 
 $format, $ReplaceWith, $replace) > $null
}
Function findAndReplaceWholeDoc($Document, $Find, $ReplaceWith) {
 $findReplace = $Document.ActiveWindow.Selection.Find
 findAndReplace -Text $findReplace -Find $Find -ReplaceWith $ReplaceWith
 ForEach ($section in $Document.Sections) {
 ForEach ($header in $section.Headers) {
 $findReplace = $header.Range.Find
 findAndReplace -Text $findReplace -Find $Find -ReplaceWith $ReplaceWith
 $header.Shapes | ForEach-Object {
 if ($_.Type -eq [Microsoft.Office.Core.msoShapeType]::msoTextBox) {
 $findReplace = $_.TextFrame.TextRange.Find
 findAndReplace -Text $findReplace -Find $Find -ReplaceWith $ReplaceWith
 }
 }
 }
 ForEach ($footer in $section.Footers) {
 $findReplace = $footer.Range.Find
 findAndReplace -Text $findReplace -Find $Find -ReplaceWith $ReplaceWith
 }
 }
}
Function processDoc {
 $doc = $word.Documents.Open($_.FullName)
 findAndReplaceWholeDoc -Document $doc -Find "THIS" -ReplaceWith "THAT"
 $doc.Close([ref]$true)
}
$sw = [Diagnostics.Stopwatch]::StartNew()
$count = 0
Get-ChildItem -Path $folderPath -Recurse -Filter $fileType | ForEach-Object { 
 Write-Host "Processing \`"$($_.Name)\`"..."
 processDoc
 $count++
}
$sw.Stop()
$elapsed = $sw.Elapsed.toString()
Write-Host "`nDone. $count files processed in $elapsed" 
$word.Quit()
$word = $null
[gc]::collect() 
[gc]::WaitForPendingFinalizers()
asked Aug 31, 2017 at 9:04
\$\endgroup\$
8
  • \$\begingroup\$ I've never written a line of Powershell script in my life, and I can understand this code. That's impressive code clarity for someone who claims to not be a professional coder! \$\endgroup\$ Commented Aug 31, 2017 at 9:32
  • \$\begingroup\$ I'm not a professional coder being a professionl coder does not mean anything :-) I've been working with many people who called themselves professionals and their code was just a single god method with several hundred lines of code. What you wrote is pretty good. \$\endgroup\$ Commented Aug 31, 2017 at 10:39
  • \$\begingroup\$ Did you add manual line breaks after such method calls as $Document.ActiveWindow.Selection.Find for the sake of reducing horizontal scrolling in the question? Continuation in a new line in powershell usually requires a ` backtick... \$\endgroup\$ Commented Aug 31, 2017 at 10:46
  • \$\begingroup\$ I did in my code and I just did it again here (didn't work quite well the first time). I also escaped the backticks that escape the double quotes as syntax highlighting didn't work correctly. \$\endgroup\$ Commented Aug 31, 2017 at 10:58
  • \$\begingroup\$ *.doc will take all .doc* files puzzles me. \$\endgroup\$ Commented Feb 23, 2018 at 18:46

5 Answers 5

2
\$\begingroup\$

COM Automation (which is what you are using) is always going to be slow. There's not much you can do about that except to try find new ways to do what you want with as few operations as possible.

An alternative you could investigate is the Open XML SDK. I've never tried it myself, but it is supposed to be a lot faster than COM Automation.

The Open XML SDK is a .NET library, so there should be no problem calling it from PowerShell, but you will have to translate the example code from C# or VB.NET into PowerShell.

Here's an example for Excel which you could adapt. Or maybe you could find an actual Word example. I didn't search very hard.

You should also check out Open-XML-PowerTools. This is a PowerShell wrapper for the Open XML SDK. Maybe it will do what you want already.

Here's a screencast that shows searching and replacing using Open-XML-PowerTools.

answered Sep 1, 2017 at 2:29
\$\endgroup\$
2
  • \$\begingroup\$ Thanks for the suggestion, I'd like however to stick with the current approach for now as there is no need for installing external tools and I am sure my script will be easily portable to other computers / users. \$\endgroup\$ Commented Sep 1, 2017 at 9:56
  • \$\begingroup\$ I have tried the StoryRange approach seen here but not successfully so far \$\endgroup\$ Commented Sep 1, 2017 at 11:18
2
\$\begingroup\$

Shouldn't:

#region Find/Replace parameters
...
$findWrap = [Microsoft.Office.Interop.Word.WdReplace]::wdReplaceAll
$format = $false
$replace = [Microsoft.Office.Interop.Word.WdFindWrap]::wdFindContinue
#endregion

be:

#region Find/Replace parameters
...
$replace = [Microsoft.Office.Interop.Word.WdReplace]::wdReplaceAll
$format = $false
$findWrap = [Microsoft.Office.Interop.Word.WdFindWrap]::wdFindContinue
#endregion

That is - replace the variable names $replace and $findWrap?

@YeO, thank you for your contribution... I'm doing something similar and integrated some of your code (referencing your answer, of course).

Sᴀᴍ Onᴇᴌᴀ
29.5k16 gold badges45 silver badges201 bronze badges
answered Feb 23, 2018 at 18:13
\$\endgroup\$
2
  • \$\begingroup\$ Indeed, it totally should!! Good catch! \$\endgroup\$ Commented Feb 23, 2018 at 18:27
  • \$\begingroup\$ Actually, if counting the replacements is important, it's best to use wdReplaceOne for $replace otherwise the count will be inaccurate. \$\endgroup\$ Commented Feb 23, 2018 at 18:42
1
\$\begingroup\$

Ok, here's a much better one yet. I have elected to apply multiple find and replace as I loop through the StoryRanges of the document instead of calling my former function several times (and then loop through the StoryRanges over and over).
I'm also now looking for the Shapes inside Headers and Footers directly from the Shapes collection and not from the StoryRanges this works much better. We access this collection from any Section's Header (or Footer) so we simply look into the first Header of the first Section, hence the Sections.Item(1).Headers.Item(1).
Finally, rather than muting the output of the findAndReplace, I'm counting how many times we do an actual replacement.
Hopefully someone finds this helpful, it was a great way to start using PowerShell for me anyway.

$folderPath = "C:\Users\user\folder\*" # multi-folders: "C:\fso1*", "C:\fso2*"
$fileType = "*.doc" # *.doc will take all .doc* files
$textToReplace = @{
# "TextToFind" = "TextToReplaceWith"
"This1" = "That1"
"This2" = "That2"
"This3" = "That3"
}
$word = New-Object -ComObject Word.Application
$word.Visible = $false
#region Find/Replace parameters
$matchCase = $true
$matchWholeWord = $true
$matchWildcards = $false
$matchSoundsLike = $false
$matchAllWordForms = $false
$forward = $true
$findWrap = [Microsoft.Office.Interop.Word.WdFindWrap]::wdFindContinue
$format = $false
$replace = [Microsoft.Office.Interop.Word.WdReplace]::wdReplaceOne
#endregion
$countf = 0 #count files
$countr = 0 #count replacements per file
$counta = 0 #count all replacements
Function findAndReplace($objFind, $FindText, $ReplaceWith) {
 #simple Find and Replace to execute on a Find object
 #we let the function return (True/False) to count the replacements
 $objFind.Execute($FindText, $matchCase, $matchWholeWord, $matchWildCards, $matchSoundsLike, $matchAllWordForms, \`
 $forward, $findWrap, $format, $ReplaceWith, $replace) #> $null
}
Function findAndReplaceAll($objFind, $FindText, $ReplaceWith) {
 #make sure we replace all occurrences (while we find a match)
 $count = 0
 $count += findAndReplace $objFind $FindText $ReplaceWith
 While ($objFind.Found) {
 $count += findAndReplace $objFind $FindText $ReplaceWith
 }
 return $count
}
Function findAndReplaceMultiple($objFind, $lookupTable) {
 #apply multiple Find and Replace on the same Find object
 $count = 0
 $lookupTable.GetEnumerator() | ForEach-Object {
 $count += findAndReplaceAll $objFind $_.Key $_.Value
 }
 return $count
}
Function findAndReplaceWholeDoc($Document, $lookupTable) {
 $count = 0
 # Loop through each StoryRange
 ForEach ($storyRge in $Document.StoryRanges) {
 Do {
 $count += findAndReplaceMultiple $storyRge.Find $lookupTable
 #check for linked Ranges
 $storyRge = $storyRge.NextStoryRange
 } Until (!$storyRge) #null is False
 }
 #region Loop through Shapes within Headers and Footers
 # https://msdn.microsoft.com/en-us/vba/word-vba/articles/shapes-object-word
 # "The Count property for this collection in a document returns the number of items in the main story only.
 # To count the shapes in all the headers and footers, use the Shapes collection with any HeaderFooter object."
 # Hence the .Sections.Item(1).Headers.Item(1) which should be able to collect all Shapes, without the need
 # for looping through each Section.
 #endregion
 $shapes = $Document.Sections.Item(1).Headers.Item(1).Shapes
 If ($shapes.Count) {
 #ForEach ($shape in $shapes | Where {$_.TextFrame.HasText -eq -1}) {
 ForEach ($shape in $shapes | Where {[bool]$_.TextFrame.HasText}) {
 #Write-Host $($shape.TextFrame.HasText)
 $count += findAndReplaceMultiple $shape.TextFrame.TextRange.Find $lookupTable
 }
 }
 return $count
}
Function processDoc {
 $doc = $word.Documents.Open($_.FullName)
 $count = findAndReplaceWholeDoc $doc $textToReplace
 $doc.Close([ref]$true)
 return $count
}
$sw = [Diagnostics.Stopwatch]::StartNew()
Get-ChildItem -Path $folderPath -Recurse -Filter $fileType | ForEach-Object { 
 Write-Host "Processing \`"$($_.Name)\`"..."
 $countr = processDoc
 Write-Host "$countr replacements made."
 $counta += $countr
 $countf++
}
$sw.Stop()
$elapsed = $sw.Elapsed.toString()
Write-Host "`nDone. $countf files processed in $elapsed"
Write-Host "$counta replacements made in total."
$word.Quit()
$word = $null
[gc]::collect() 
[gc]::WaitForPendingFinalizers()
answered Sep 3, 2017 at 16:04
\$\endgroup\$
1
\$\begingroup\$

Nice one. In response to your question 'a better approach to get to the Headers TextBoxes'... this is what we are using .. .less nested loops

# In Word Doc - Loop through each StoryRange and perform a find and replace
# This includes text and shapes in the body but only text in the headers & footers
ForEach ($storyRng in $newWordDoc.StoryRanges) {
 $success = $storyRng.Find.Execute($findText,$matchCase,$matchWholeWord,$matchWildcards,$matchSoundsLike,$matchAllWordForms,$forward,$wrap,$format,$replaceWith,$replace) 
}
# In Word Doc - Loop through all Shapes within all Headers and Footers and perform a find and replace
$shapes = $newWordDoc.Sections.Item(1).Headers.Item(1).Shapes
If ($shapes.Count) {
 ForEach ($shape in $shapes | Where {[bool]$_.TextFrame.HasText}) {
 $success = $shape.TextFrame.TextRange.Find.Execute($findText,$matchCase,$matchWholeWord,$matchWildcards,$matchSoundsLike,$matchAllWordForms,$forward,$wrap,$format,$replaceWith,$replace) 
 }
}

Description of below full code:

  • Copies a word template (well *.docx with mail merge looking text, eg: '[full_name], not a *.dotx)
  • You provide it a hashtable of search and replace string pairs (see main body at the bottom)
  • For each search and replace string pair, it replaces ALL occurances of the search string (eg: '[full_name]') in the doc with the replace string (eg: 'Jane Doe') in ALL sections of the doc, incl body, header, footer, shapes
  • saves the word doc copy
  • Saves as PDF
  • Deletes the word doc copy
# -------------------------------------------------------------
# FUNCTION: Create-PdfFromWordTemplate (*.docx -> *.pdf)
# -------------------------------------------------------------
# Create a PDF file based on a copy of a Word Template
# after the equivalent of a mail merge (search and replace).
function Create-PdfFromWordTemplate {
 [CmdletBinding()]
 Param(
 [Parameter(Mandatory=$True)]
 [ValidateScript({Test-Path $_})]
 [string]$FilePath,
 [Parameter(Mandatory=$True)]
 [string]$FileNameExclExt,
 [Parameter(Mandatory=$True)]
 [ValidateScript({Test-Path $_})]
 [string]$TemplateFilePath,
 [Parameter(Mandatory=$True)]
 [string]$TemplateFileName,
 [Parameter(Mandatory=$True)]
 [hashtable]$SearchAndReplacePairs
 )
 Begin {
 # Create a Microsoft Word Application object
 try {
 $wordApp = New-Object -ComObject Word.Application
 $wordApp.Visible = $false
 }
 catch {
 Write-Warning "$(Get-Date): error in Create-PdfFromWordTemplate create com object: $($_)"
 return
 }
 # Variables - Word Application Enumerations
 # https://docs.microsoft.com/en-us/office/vba/api/word.find.execute 
 $wdFindContinue = 1 # https://docs.microsoft.com/en-us/office/vba/api/word.wdfindwrap
 $wdReplaceAll = 2 # https://docs.microsoft.com/en-us/office/vba/api/word.wdreplace
 $wdFormatPDF = 17 # https://docs.microsoft.com/en-us/office/vba/api/word.wdsaveformat
 $matchCase = $false 
 $matchWholeWord = $false
 $matchWildcards = $false 
 $matchSoundsLike = $false 
 $matchAllWordForms = $false 
 $forward = $true 
 $wrap = $wdFindContinue 
 $format = $false
 $replace = $wdReplaceAll 
 }
 Process {
 # Filenames
 $fullTemplFileName = Join-Path -Path $TemplateFilePath -ChildPath "$($TemplateFileName)"
 $fullWordFileName = Join-Path -Path $TemplateFilePath -ChildPath "$($FileNameExclExt).docx"
 $fullPdfFileName = Join-Path -Path $FilePath -ChildPath "$($FileNameExclExt).pdf"
 # Copy Word Template to a New Temporary Word doc
 try {
 Copy-Item -Path "$($fullTemplFileName)" "$($fullWordFileName)"
 }
 catch {
 Write-Warning "$(Get-Date): error in Create-PdfFromWordTemplate Copy-Item: $($_)"
 return
 }
 # Open Word Doc
 try {
 $newWordDoc = $wordApp.Documents.Open( $fullWordFileName )
 # Loop through each Search and Replace Pair 
 # Search and replace in the temp word doc
 ForEach ($key in $SearchAndReplacePairs.keys) {
 
 Write-Verbose "$(Get-DateTime): Replacing $($key) with $($SearchAndReplacePairs[$key]) in $($fullWordFileName)"
 # Set Search & Replace values
 $findText = $key
 $replaceWith = $SearchAndReplacePairs[$key]
 # In Word Doc - Loop through each StoryRange and perform a find and replace
 # This includes text and shapes in the body but only text in the headers & footers
 ForEach ($storyRng in $newWordDoc.StoryRanges) {
 $success = $storyRng.Find.Execute($findText,$matchCase,$matchWholeWord,$matchWildcards,$matchSoundsLike,$matchAllWordForms,$forward,$wrap,$format,$replaceWith,$replace) 
 }
 # In Word Doc - Loop through all Shapes within all Headers and Footers and perform a find and replace
 $shapes = $newWordDoc.Sections.Item(1).Headers.Item(1).Shapes
 If ($shapes.Count) {
 ForEach ($shape in $shapes | Where {[bool]$_.TextFrame.HasText}) {
 $success = $shape.TextFrame.TextRange.Find.Execute($findText,$matchCase,$matchWholeWord,$matchWildcards,$matchSoundsLike,$matchAllWordForms,$forward,$wrap,$format,$replaceWith,$replace) 
 }
 }
 }
 # Save the word document
 $newWordDoc.Save()
 # Save the word document as a PDF document
 Write-Verbose "$(Get-DateTime): Converting: $($fullWordFileName) to $($fullPdfFileName)"
 $newWordDoc.SaveAs([ref] "$fullPdfFileName", [ref] $wdFormatPDF)
 # Close the word document
 $newWordDoc.Close()
 # Remove the temporary word document
 Remove-Item -Path $fullWordFileName
 }
 catch {
 Write-Warning "$(Get-Date): error in Create-PdfFromWordTemplate: $($_)"
 }
 }
 End {
 # Exit the Word Application & Cleanup
 try {
 $wordApp.Quit()
 $null = [System.Runtime.Interopservices.Marshal]::ReleaseComObject($wordApp)
 [System.GC]::Collect() 
 [System.GC]::WaitForPendingFinalizers()
 }
 catch {
 Write-Warning "$(Get-Date): error in Create-PdfFromWordTemplate end: $($_)"
 }
 }
}
# -------------------------------------------------------------
# MAIN BODY
# -------------------------------------------------------------
# Prepare the list of search and replace string/value pairs
$searchReplacePairs = @{
 "[full_name]" = "Jane Doe"
 "[dob]" = "01/01/2000"
}
Create-PdfFromWordTemplate -FilePath "C:\letters" -FileNameExclExt "Letter" -TemplateFilePath "C:\templates" -TemplateFileName "SampleTemplate.docx" -SearchAndReplacePairs $searchReplacePairs

I also came across this but haven't used it yet. It may help: http://blog.beyondimpactllc.com/blog/garbage-collection-with-powershell

 # Adding the garbage collection (GC) line at the start of each for/foreach loop helps improve memory intensive scripts
 foreach ($server in $servers) {
 [system.gc]::Collect()
 $events = get-winevent | where filterItSomehowToReduceSizeOfStoredObject
 $events | export-csv $outputFilePath
 }
answered Oct 22, 2021 at 14:54
\$\endgroup\$
2
  • \$\begingroup\$ Welcome to Code Review! While this appears to have helped the OP, it is an alternative solution but not really a review of the OP's code. Please explain your reasoning (how your solution works and why it is better than the original) so that the author and other readers can learn from your thought process. Please read Why are alternative solutions not welcome? \$\endgroup\$ Commented Oct 22, 2021 at 15:21
  • \$\begingroup\$ Apoligies, but I thought I was helping answer how to more efficiently loop through body, header & footer which is exactly what the OP was asking, or maybe I'm mistaken? \$\endgroup\$ Commented Oct 23, 2021 at 9:59
0
\$\begingroup\$

Perhaps not yet optimized, but here's where I am so far. I believe this approach is better, although maybe not much faster. At least, it should not leave out any Shape that contains text, whether in Headers or Footers and it only has 2 levels of nested ForEach.
Inspiration came from (and credits should go to) this page.

$storyTypes = [Microsoft.Office.Interop.Word.WdStoryType] 
Function findAndReplaceWholeDoc($Document, $FindText, $ReplaceWith) {
 ForEach ($storyRge in $Document.StoryRanges) {
 Do {
 findAndReplace -objFind $storyRge.Find -FindText $FindText -ReplaceWith $ReplaceWith
 If (($storyRge.StoryType -ge $storyTypes::wdEvenPagesHeaderStory) -and \`
 ($storyRge.StoryType -le $storyTypes::wdFirstPageFooterStory)) {
 If ($storyRge.ShapeRange.Count -gt 0) {
 ForEach ($shp in $storyRge.ShapeRange) {
 If ($shp.TextFrame.HasText -eq -1) {
 $obj = $shp.TextFrame.TextRange.Find
 findAndReplace -objFind $obj -FindText $FindText -ReplaceWith $ReplaceWith
 }
 }
 }
 }
 $storyRge = $storyRge.NextStoryRange
 } Until ($storyRge -eq $null)
 }
}
answered Sep 3, 2017 at 10:17
\$\endgroup\$
1
  • \$\begingroup\$ This approach isn't as efficient as the 'Sections.Item(1).Headers.Item(1)' one... \$\endgroup\$ Commented Feb 23, 2018 at 18:46

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.