I have a large number of files (about 3800) that I want to run a program over. The program reads a WAV file and makes a short .TSV text file containing the WAV's lip-sync data (but that is by-the-by for explanation).
I have written a batch script to do this, but it is quite slow, and clunky:
FOR /F %%F IN ('dir /o-d /b "%basepath%\wavs\*.wav"') DO (
"%basepath%\TSV\rhubarb-lip-sync-1.10.0-win32\rhubarb.exe" "%basepath%\wavs\%%F" -o "%basepath%\sync\%%~nF.tsv"
)
Is there any way to convert this into PowerShell using parallel processing or parallel.foreach?
Is there a utility in batch that runs this command in parallel to increase the speed?
I wrote a shell script which uses the GNU parallel program in Linux for parallel processing which is very fast, but I can't find a spare machine to run it, so it comes back to doing the same thing with batch or PowerShell.
-
1Jons were my first thought, but that's a process for each job execution. Start-ThreadJob would almost certainly be the best approach.Richard– Richard2025年10月20日 21:28:23 +00:00Commented Oct 20 at 21:28
-
possible duplicate (all batch)Stephan– Stephan2025年10月21日 06:06:36 +00:00Commented Oct 21 at 6:06
3 Answers 3
Here is a batch solution for parallel processing of the audio files.
@echo off
setlocal EnableExtensions DisableDelayedExpansion
set "basepath=."
for /F "eol=| delims=" %%I in ('dir "%basepath%\wavs\*.wav" /B /O-D 2^>nul') do (
start "TSV for %%I" /MIN "%basepath%\TSV\rhubarb-lip-sync-1.10.0-win32\rhubarb.exe" "%basepath%\wavs\%%I" -o "%basepath%\sync\%%~nI.tsv"
call :CheckParallel
)
exit /B
:CheckParallel
for /F "delims=" %%J in ('%SystemRoot%\System32\tasklist.exe /FI "IMAGENAME eq rhubarb.exe" /NH ^| %SystemRoot%\System32\find.exe /I /C "rhubarb.exe"') do if %%J LSS 8 goto :EOF
%SystemRoot%\System32\timeout.exe /T 1 1>nul & goto CheckParallel
Replace the number 8 after the comparison operator LSS by the number of cores of the processor(s) of the used computer subtracted by 1 or 2 to have one or two cores remaining for Windows and other running applications.
The batch file processing is finished finally while the last running rhubarb.exe processes are still processing the last audio files. There can be added extra command lines above exit /B to check periodically every second with TASKLIST and FIND how many rhubarb.exe are still running and exit the batch file processing once the count becomes 0.
EDIT: Here is a slightly enhanced version where the maximum number of parallel processes is assigned to an environment variable which is later used to wait for finishing all running rhubarb.exe before exiting the batch file processing.
@echo off
setlocal EnableExtensions DisableDelayedExpansion
set /A ProcessesNumber=NUMBER_OF_PROCESSORS - 2
if %ProcessesNumber% LSS 1 set "ProcessesNumber=1"
set "basepath=."
for /F "eol=| delims=" %%I in ('dir "%basepath%\wavs\*.wav" /B /O-D 2^>nul') do (
start "TSV for %%I" /MIN "%basepath%\TSV\rhubarb-lip-sync-1.10.0-win32\rhubarb.exe" "%basepath%\wavs\%%I" -o "%basepath%\sync\%%~nI.tsv"
call :CheckParallel
)
set "ProcessesNumber=1"
call :CheckParallel
exit /B
:CheckParallel
for /F "delims=" %%J in ('%SystemRoot%\System32\tasklist.exe /FI "IMAGENAME eq rhubarb.exe" /NH ^| %SystemRoot%\System32\find.exe /I /C "rhubarb.exe"') do if %%J LSS %ProcessesNumber% goto :EOF
%SystemRoot%\System32\timeout.exe /T 1 1>nul & goto CheckParallel
To understand the commands used and how they work, open a command prompt window, execute there the following commands, and read the displayed help pages for each command, entirely and carefully.
call /?dir /?echo /?exit /?find /?for /?goto /?if /?set /?setlocal /?start /?tasklist /?timeout /?
Comments
You can replace your Batch loop with PowerShell 7’s native parallel processing, which works much like GNU parallel.
$basepath = "C:\path\to\your\project"
$exe = Join-Path $basepath "TSV\rhubarb-lip-sync-1.10.0-win32\rhubarb.exe"
# Get all .wav files
$wavFiles = Get-ChildItem -Path "$basepath\wavs" -Filter *.wav
$total = $wavFiles.Count
Write-Host "Processing $total files using Rhubarb Lip Sync..."
$startTime = Get-Date
# Use thread-safe counter
$counter = [System.Collections.Concurrent.ConcurrentBag[int]]::new()
$wavFiles | ForEach-Object -Parallel {
$inputFile = $_.FullName
$outputFile = Join-Path $using:basepath "sync\$($_.BaseName).tsv"
# Simulate atomic counter (add 1 to the bag per job)
$using:counter.Add(1)
$count = $using:counter.Count
Write-Host "[$count / $using:total] Processing: $($_.Name)"
& $using:exe $inputFile -o $outputFile | Out-Null
} -ThrottleLimit ([Environment]::ProcessorCount)
$endTime = Get-Date
$duration = $endTime - $startTime
Write-Host "Processed $total files in $($duration.ToString("hh\:mm\:ss"))"
If you can’t install PowerShell 7, you can still use
$basepath = "C:\path\to\your\project"
$exe = "$basepath\TSV\rhubarb-lip-sync-1.10.0-win32\rhubarb.exe"
Get-ChildItem "$basepath\wavs" -Filter *.wav | ForEach-Object {
Start-Job {
param($exe, $file, $base)
& $exe $file.FullName -o "$base\sync\$($file.BaseName).tsv" | Out-Null
} -ArgumentList $exe, $_, $basepath
}
Get-Job | Wait-Job
7 Comments
-Parallel doesn't exist in your PowerShell version, either install v7 or try out something like in this: stackoverflow.com/questions/74257556/… or much better, install the module mentioned there$PSVersionTable.)Start-Job is the only built in option for parallelism, but it is based on child processes, which is slow, resource-intensive, and requires manual throttling. Start-ThreadJob, from the ThreadJob module, offers throttled, thread-based parallelization, which avoids all these problems; it ships with PowerShell (Core) 7 and is installable on demand in Windows PowerShell (e.g., Install-Module ThreadJob -Scope CurrentUser). See this answer.Another alternative using Start-Process which by itself returns immediately, behaving very much like asynchronous operations. The code is a bit more complex as you need to write your own throttling mechanism. Also knowing if there were errors will depend on your binary and if it sets the .ExitCode correctly.
$throttle = 4 # determines how many at the same time, change as needed
$basepath = 'C:\path\to\your\project'
$exe = Join-Path $basepath 'TSV\rhubarb-lip-sync-1.10.0-win32\rhubarb.exe'
$procs = [System.Collections.Generic.List[System.Diagnostics.Process]]::new()
function processany {
param($list)
($list | Wait-Process -Any -PassThru) | ForEach-Object {
if ($_.ExitCode -and $_.ExitCode -ne 0) {
"Process Id: '$($_.Id)' failed with Exit Code: '$($_.ExitCode)'. " +
"Arguments: '$($_.StartInfo.Arguments)'." | Write-Error
}
$null = $list.Remove($_)
}
}
Get-ChildItem $basepath\wavs\*.wav | ForEach-Object {
if ($procs.Count -eq $throttle) {
processany $procs
}
$proc = Start-Process $exe -PassThru -ArgumentList @(
"`"$($_.FullName)`""
"-o", "`"${basepath}\sync\$($_.BaseName).tsv`"")
$procs.Add($proc)
}
while ($procs.Count -gt 0) {
processany $procs
}
Otherwise for a simpler solution since you're using PowerShell 5.1, you can try out the module mentioned in Is there an easier way to run commands in parallel while keeping it efficient in Windows PowerShell?:
Install-Module PSParallelPipeline -Scope CurrentUser
Then the code in colonel's answer is exactly the same except you'd change ForEach-Object -Parallel for Invoke-Parallel.
Comments
Explore related questions
See similar questions with these tags.