3
\$\begingroup\$

I am using Powershell 7.

We have the following PowerShell script that will parse some very large file.

I no longer want to use 'Get-Content' as this is to slow.

The script below works, but it takes a very long time to process even a 10 MB file.

I have about 200 files 10MB file with over 10000 lines.

Sample Log:

#Fields:1
#Fields:2
#Fields:3
#Fields:4
#Fields: date-time,connector-id,session-id,sequence-number,local-endpoint,remote-endpoint,event,data,context
2023年01月31日T13:53:50.404Z,EXCH1\Relay-EXCH1,08DAD23366676FF1,41,10.10.10.2:25,195.85.212.22:15650,<,DATA,
2023年01月31日T13:53:50.404Z,EXCH1\Relay-EXCH1,08DAD23366676FF1,41,10.10.10.2:25,195.85.212.25:15650,<,DATA,

Script:

$Output = @()
$LogFilePath = "C:\LOGS\*.log"
$LogFiles = Get-Item $LogFilePath
$Count = @($logfiles).count
ForEach ($Log in $LogFiles)
{
 $Int = $Int + 1
 
 $Percent = $Int/$Count * 100
 Write-Progress -Activity "Collecting Log details" -Status "Processing log File $Int of $Count - $LogFile" -PercentComplete $Percent 
 Write-Host "Processing Log File $Log" -ForegroundColor Magenta
 Write-Host
 $FileContent = Get-Content $Log | Select-Object -Skip 5
 ForEach ($Line IN $FileContent)
 {
 $Socket = $Line | Foreach {$_.split(",")[5] }
 $IP = $Socket.Split(":")[0]
 $Output += $IP
 } 
} 
$Output = $Output | Select-Object -Unique
$Output = $Output | Sort-Object
Write-Host "List of noted remove IPs:" 
$Output
Write-Host 
$Output | Out-File $PWD\Output.txt 
Sᴀᴍ Onᴇᴌᴀ
29.5k16 gold badges45 silver badges201 bronze badges
asked Feb 4, 2023 at 19:05
\$\endgroup\$

1 Answer 1

2
\$\begingroup\$

You complain that processing takes too long, without offering number of elapsed seconds for total script or for component stages.

Are you sure Get-Content is the slow portion?

Let's focus on the inner loop.

 $Socket = $Line | Foreach {$_.split(",")[5] }
 $IP = $Socket.Split(":")[0]
 $Output += $IP

We have a string, $Line, and yet we go through the overhead of pipe .NET dispatch. Why not just call $Line.Split() directly?

Accumulating $Output += $IP might have O(n^2) quadratic cost. Benchmark it to see. Process 10,000 IPs in a single go, and then process 100 files each having 100 IPs. Elapsed time should be the same if the inner loop has O(n) linear cost.


Summary: This powershell script does not yet work correctly. The use case requirements include Performance, and it does not yet run quickly enough to satisfy business objectives.

Written timing measurements will be needed if the source of slowness is to be isolated.

answered Feb 4, 2023 at 19:28
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.