1

I have a large number of files (about 3800) that I want to run a program over. The program reads a WAV file and makes a short .TSV text file containing the WAV's lip-sync data (but that is by-the-by for explanation).

I have written a batch script to do this, but it is quite slow, and clunky:

FOR /F %%F IN ('dir /o-d /b "%basepath%\wavs\*.wav"') DO (
 "%basepath%\TSV\rhubarb-lip-sync-1.10.0-win32\rhubarb.exe" "%basepath%\wavs\%%F" -o "%basepath%\sync\%%~nF.tsv"
)

Is there any way to convert this into PowerShell using parallel processing or parallel.foreach?

Is there a utility in batch that runs this command in parallel to increase the speed?

I wrote a shell script which uses the GNU parallel program in Linux for parallel processing which is very fast, but I can't find a spare machine to run it, so it comes back to doing the same thing with batch or PowerShell.

Mofi
49.9k19 gold badges89 silver badges159 bronze badges
asked Oct 20 at 20:42
2
  • 1
    Jons were my first thought, but that's a process for each job execution. Start-ThreadJob would almost certainly be the best approach. Commented Oct 20 at 21:28
  • possible duplicate (all batch) Commented Oct 21 at 6:06

3 Answers 3

3

Here is a batch solution for parallel processing of the audio files.

@echo off
setlocal EnableExtensions DisableDelayedExpansion
set "basepath=."
for /F "eol=| delims=" %%I in ('dir "%basepath%\wavs\*.wav" /B /O-D 2^>nul') do (
 start "TSV for %%I" /MIN "%basepath%\TSV\rhubarb-lip-sync-1.10.0-win32\rhubarb.exe" "%basepath%\wavs\%%I" -o "%basepath%\sync\%%~nI.tsv"
 call :CheckParallel
)
exit /B
:CheckParallel
for /F "delims=" %%J in ('%SystemRoot%\System32\tasklist.exe /FI "IMAGENAME eq rhubarb.exe" /NH ^| %SystemRoot%\System32\find.exe /I /C "rhubarb.exe"') do if %%J LSS 8 goto :EOF
%SystemRoot%\System32\timeout.exe /T 1 1>nul & goto CheckParallel

Replace the number 8 after the comparison operator LSS by the number of cores of the processor(s) of the used computer subtracted by 1 or 2 to have one or two cores remaining for Windows and other running applications.

The batch file processing is finished finally while the last running rhubarb.exe processes are still processing the last audio files. There can be added extra command lines above exit /B to check periodically every second with TASKLIST and FIND how many rhubarb.exe are still running and exit the batch file processing once the count becomes 0.

EDIT: Here is a slightly enhanced version where the maximum number of parallel processes is assigned to an environment variable which is later used to wait for finishing all running rhubarb.exe before exiting the batch file processing.

@echo off
setlocal EnableExtensions DisableDelayedExpansion
set /A ProcessesNumber=NUMBER_OF_PROCESSORS - 2
if %ProcessesNumber% LSS 1 set "ProcessesNumber=1"
set "basepath=."
for /F "eol=| delims=" %%I in ('dir "%basepath%\wavs\*.wav" /B /O-D 2^>nul') do (
 start "TSV for %%I" /MIN "%basepath%\TSV\rhubarb-lip-sync-1.10.0-win32\rhubarb.exe" "%basepath%\wavs\%%I" -o "%basepath%\sync\%%~nI.tsv"
 call :CheckParallel
)
set "ProcessesNumber=1"
call :CheckParallel
exit /B
:CheckParallel
for /F "delims=" %%J in ('%SystemRoot%\System32\tasklist.exe /FI "IMAGENAME eq rhubarb.exe" /NH ^| %SystemRoot%\System32\find.exe /I /C "rhubarb.exe"') do if %%J LSS %ProcessesNumber% goto :EOF
%SystemRoot%\System32\timeout.exe /T 1 1>nul & goto CheckParallel

To understand the commands used and how they work, open a command prompt window, execute there the following commands, and read the displayed help pages for each command, entirely and carefully.

  • call /?
  • dir /?
  • echo /?
  • exit /?
  • find /?
  • for /?
  • goto /?
  • if /?
  • set /?
  • setlocal /?
  • start /?
  • tasklist /?
  • timeout /?
answered Oct 21 at 6:34
Sign up to request clarification or add additional context in comments.

Comments

2

You can replace your Batch loop with PowerShell 7’s native parallel processing, which works much like GNU parallel.

$basepath = "C:\path\to\your\project"
$exe = Join-Path $basepath "TSV\rhubarb-lip-sync-1.10.0-win32\rhubarb.exe"
# Get all .wav files
$wavFiles = Get-ChildItem -Path "$basepath\wavs" -Filter *.wav
$total = $wavFiles.Count
Write-Host "Processing $total files using Rhubarb Lip Sync..."
$startTime = Get-Date
# Use thread-safe counter
$counter = [System.Collections.Concurrent.ConcurrentBag[int]]::new()
$wavFiles | ForEach-Object -Parallel {
 $inputFile = $_.FullName
 $outputFile = Join-Path $using:basepath "sync\$($_.BaseName).tsv"
 # Simulate atomic counter (add 1 to the bag per job)
 $using:counter.Add(1)
 $count = $using:counter.Count
 Write-Host "[$count / $using:total] Processing: $($_.Name)"
 & $using:exe $inputFile -o $outputFile | Out-Null
} -ThrottleLimit ([Environment]::ProcessorCount)
$endTime = Get-Date
$duration = $endTime - $startTime
Write-Host "Processed $total files in $($duration.ToString("hh\:mm\:ss"))"

If you can’t install PowerShell 7, you can still use

$basepath = "C:\path\to\your\project"
$exe = "$basepath\TSV\rhubarb-lip-sync-1.10.0-win32\rhubarb.exe"
Get-ChildItem "$basepath\wavs" -Filter *.wav | ForEach-Object {
 Start-Job {
 param($exe, $file, $base)
 & $exe $file.FullName -o "$base\sync\$($file.BaseName).tsv" | Out-Null
 } -ArgumentList $exe, $_, $basepath
}
Get-Job | Wait-Job
answered Oct 20 at 21:10

7 Comments

I get this: ForEach-Object : Cannot bind parameter 'RemainingScripts'. Cannot convert the "-ThrottleLimit" value of type "System.String" to type "System.Management.Automation.ScriptBlock". At C:1円 TSV generator paralell.ps1:11 char:66 + ... -Path "$basepath\wavs" -Filter *.wav | ForEach-Object { + ~~~~~~~~~~~~~~~~~ + CategoryInfo : InvalidArgument: (:) [ForEach-Object], ParameterBindingException + FullyQualifiedErrorId : CannotConvertArgumentNoMessage,Microsoft.PowerShell.Commands.ForEachObjectCommand I have P
-Parallel doesn't exist in your PowerShell version, either install v7 or try out something like in this: stackoverflow.com/questions/74257556/… or much better, install the module mentioned there
@BenjaminRich what version of PowerShell are you using? (|Check $PSVersionTable.)
If you do not have powershell version 7, you can still do your work but its less efficient than the previous solution. I have edited the answer
In Windows PowerShell, Start-Job is the only built in option for parallelism, but it is based on child processes, which is slow, resource-intensive, and requires manual throttling. Start-ThreadJob, from the ThreadJob module, offers throttled, thread-based parallelization, which avoids all these problems; it ships with PowerShell (Core) 7 and is installable on demand in Windows PowerShell (e.g., Install-Module ThreadJob -Scope CurrentUser). See this answer.
@colonel - can you tell me how to add output to this script ie. "The current job is %count% of %totalfiles%" on launch of a new rhubbarb.exe process? Also I was wondering where this script ended as I want to put a time it started and finished, and then the difference between them for total time. I... pretty much know how to do 3 commands in Batch and 2 in Powershell, so I defer to the expert :P
Sure! You can track progress and timing by adding a thread-safe counter and a start/end timestamp. I’ve updated my answer.
0

Another alternative using Start-Process which by itself returns immediately, behaving very much like asynchronous operations. The code is a bit more complex as you need to write your own throttling mechanism. Also knowing if there were errors will depend on your binary and if it sets the .ExitCode correctly.

$throttle = 4 # determines how many at the same time, change as needed
$basepath = 'C:\path\to\your\project'
$exe = Join-Path $basepath 'TSV\rhubarb-lip-sync-1.10.0-win32\rhubarb.exe'
$procs = [System.Collections.Generic.List[System.Diagnostics.Process]]::new()
function processany {
 param($list)
 ($list | Wait-Process -Any -PassThru) | ForEach-Object {
 if ($_.ExitCode -and $_.ExitCode -ne 0) {
 "Process Id: '$($_.Id)' failed with Exit Code: '$($_.ExitCode)'. " +
 "Arguments: '$($_.StartInfo.Arguments)'." | Write-Error
 }
 $null = $list.Remove($_)
 }
}
Get-ChildItem $basepath\wavs\*.wav | ForEach-Object {
 if ($procs.Count -eq $throttle) {
 processany $procs
 }
 $proc = Start-Process $exe -PassThru -ArgumentList @(
 "`"$($_.FullName)`""
 "-o", "`"${basepath}\sync\$($_.BaseName).tsv`"")
 $procs.Add($proc)
}
while ($procs.Count -gt 0) {
 processany $procs
}

Otherwise for a simpler solution since you're using PowerShell 5.1, you can try out the module mentioned in Is there an easier way to run commands in parallel while keeping it efficient in Windows PowerShell?:

Install-Module PSParallelPipeline -Scope CurrentUser

Then the code in colonel's answer is exactly the same except you'd change ForEach-Object -Parallel for Invoke-Parallel.

answered Oct 21 at 14:46

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.