I am using Powershell 7.
We have the following PowerShell script that will parse some very large file.
I no longer want to use 'Get-Content' as this is to slow.
The script below works, but it takes a very long time to process even a 10 MB file.
I have about 200 files 10MB file with over 10000 lines.
Sample Log:
#Fields:1
#Fields:2
#Fields:3
#Fields:4
#Fields: date-time,connector-id,session-id,sequence-number,local-endpoint,remote-endpoint,event,data,context
2023年01月31日T13:53:50.404Z,EXCH1\Relay-EXCH1,08DAD23366676FF1,41,10.10.10.2:25,195.85.212.22:15650,<,DATA,
2023年01月31日T13:53:50.404Z,EXCH1\Relay-EXCH1,08DAD23366676FF1,41,10.10.10.2:25,195.85.212.25:15650,<,DATA,
Script:
$Output = @()
$LogFilePath = "C:\LOGS\*.log"
$LogFiles = Get-Item $LogFilePath
$Count = @($logfiles).count
ForEach ($Log in $LogFiles)
{
$Int = $Int + 1
$Percent = $Int/$Count * 100
Write-Progress -Activity "Collecting Log details" -Status "Processing log File $Int of $Count - $LogFile" -PercentComplete $Percent
Write-Host "Processing Log File $Log" -ForegroundColor Magenta
Write-Host
$FileContent = Get-Content $Log | Select-Object -Skip 5
ForEach ($Line IN $FileContent)
{
$Socket = $Line | Foreach {$_.split(",")[5] }
$IP = $Socket.Split(":")[0]
$Output += $IP
}
}
$Output = $Output | Select-Object -Unique
$Output = $Output | Sort-Object
Write-Host "List of noted remove IPs:"
$Output
Write-Host
$Output | Out-File $PWD\Output.txt
1 Answer 1
You complain that processing takes too long, without offering number of elapsed seconds for total script or for component stages.
Are you sure Get-Content is the slow portion?
Let's focus on the inner loop.
$Socket = $Line | Foreach {$_.split(",")[5] }
$IP = $Socket.Split(":")[0]
$Output += $IP
We have a string, $Line
, and yet we go through
the overhead of pipe .NET dispatch. Why not
just call $Line.Split()
directly?
Accumulating $Output += $IP
might have O(n^2) quadratic cost.
Benchmark it to see.
Process 10,000 IPs in a single go,
and then process 100 files each having 100 IPs.
Elapsed time should be the same if the inner loop has O(n) linear cost.
Summary: This powershell script does not yet work correctly. The use case requirements include Performance, and it does not yet run quickly enough to satisfy business objectives.
Written timing measurements will be needed if the source of slowness is to be isolated.