I need to combine 3 sets of arrays that contain PsCustomObject
elements, such that the unique elements of each array is only listed once, but has properties that indicate the source.
# create test cases
$a = [PsCustomObject]@{Id='AAA';Name='Dev'}
$b = [PsCustomObject]@{Id='BBB';Name='Dev/Test'}
$c = [PsCustomObject]@{Id='CCC';Name='Dev/Prod'}
$d = [PsCustomObject]@{Id='DDD';Name='Dev/Test/Prod'}
$e = [PsCustomObject]@{Id='EEE';Name='Test'}
$f = [PsCustomObject]@{Id='FFF';Name='Test/Prod'}
$g = [PsCustomObject]@{Id='GGG';Name='Prod'}
# add to arrays
$dev=@()
$dev += $a
$dev += $b
$dev += $c
$dev += $d
$test=@()
$test += $b
$test += $d
$test += $e
$test += $f
$prod=@()
$prod+=$c
$prod+=$d
$prod+=$f
$prod+=$g
# array to contain the results
$result=@()
# process dev list; decorate w/ additional properties
$dev | % {
Add-Member -InputObject $_ -NotePropertyName "DEV" -NotePropertyValue "Y"
Add-Member -InputObject $_ -NotePropertyName "TEST" -NotePropertyValue $null
Add-Member -InputObject $_ -NotePropertyName "PROD" -NotePropertyValue $null
# add to results
$result += $_
}
# process test list
$test | % {
$match = $result -match $_.Id
# if a test element matches an element in the results
if ($match) {
# set the column
$match[0].TEST='Y'
}
# if this element is not in the results
else {
# additional properties
Add-Member -InputObject $_ -NotePropertyName "DEV" -NotePropertyValue $null
Add-Member -InputObject $_ -NotePropertyName "TEST" -NotePropertyValue "Y"
Add-Member -InputObject $_ -NotePropertyName "PROD" -NotePropertyValue $null
# add to results
$result += $_
}
}
# process test list
$prod | % {
$match = $result -match $_.Id
# if a test element matches an element in the results
if ($match) {
# set the column
$match[0].PROD='Y'
}
else {
# additional properties
Add-Member -InputObject $_ -NotePropertyName "DEV" -NotePropertyValue $null
Add-Member -InputObject $_ -NotePropertyName "TEST" -NotePropertyValue $null
Add-Member -InputObject $_ -NotePropertyName "PROD" -NotePropertyValue 'Y'
# add to results
$result += $_
}
}
# display the results
$result | format-table -AutoSize
the results of the script:
Id Name DEV TEST PROD
-- ---- --- ---- ----
AAA Dev Y
BBB Dev/Test Y Y
CCC Dev/Prod Y Y
DDD Dev/Test/Prod Y Y Y
EEE Test Y
FFF Test/Prod Y Y
GGG Prod Y
I realize that I could refactor the search logic into a single function, but my concern is the search logic itself.
Is there a more-efficient way to perform this logic?
3 Answers 3
Assuming your objects have a unique property, such as the Id
property in your example, you can expand each array in a single pipeline and use Select -Unique
to create a combined array, and add your tracking in the same command as such:
$result = $dev,$test,$prod|%{$_}|Select *,@{l='Dev';e={$_.Id -in $dev.id}},@{l='Prod';e={$_.id -in $prod.id}},@{l='Test';e={$_.id -in $test.id}} -Unique
That cuts the time down to a bit under half what you are seeing. I ran Measure-Command
100 times against your code vs my code (using the same test build in each, the only difference being how $result
was generated), and your code on my machine took about 16.9ms on average, and my code took 7.4ms on average.
1..100|%{Measure-Command {# create test cases
$a = [PsCustomObject]@{Id='AAA';Name='Dev'}
$b = [PsCustomObject]@{Id='BBB';Name='Dev/Test'}
$c = [PsCustomObject]@{Id='CCC';Name='Dev/Prod'}
$d = [PsCustomObject]@{Id='DDD';Name='Dev/Test/Prod'}
$e = [PsCustomObject]@{Id='EEE';Name='Test'}
$f = [PsCustomObject]@{Id='FFF';Name='Test/Prod'}
$g = [PsCustomObject]@{Id='GGG';Name='Prod'}
# add to arrays
$dev = $a,$b,$c,$d
$test = $b,$d,$e,$f
$prod = $c,$d,$f,$g
$results = $dev,$test,$prod|%{$_}|Select *,@{l='Dev';e={$_.Id -in $dev.id}},@{l='Prod';e={$_.id -in $prod.id}},@{l='Test';e={$_.id -in $test.id}} -Unique
$Results|FT -Auto
}}|Measure-Object -Average -Property TotalMilliseconds|% Average
7.408363
-
\$\begingroup\$ Very nice use of calculated fields \$\endgroup\$Jonathon Anderson– Jonathon Anderson2017年10月23日 20:15:47 +00:00Commented Oct 23, 2017 at 20:15
-
\$\begingroup\$ There could be a difference in processor speed here. What do you get when you average on
Ticks
rather thanTotalMilliseconds
? I get42995.2
Ticks for mine and97630.5
for yours \$\endgroup\$Jonathon Anderson– Jonathon Anderson2017年10月23日 20:19:30 +00:00Commented Oct 23, 2017 at 20:19 -
1\$\begingroup\$ Oh, no, I can confirm that your code is faster, and I'm sure that it consumes less memory, but mine does not modify the original objects, and would work with objects that are not in shared memory space, so if they import three CSV files or something mine would still work, where I'm pretty sure yours would fail. \$\endgroup\$TheMadTechnician– TheMadTechnician2017年10月23日 20:56:36 +00:00Commented Oct 23, 2017 at 20:56
-
\$\begingroup\$ Yea, you're right. I wasn't sure how they were creating objects. Mine is definitely not terribly flexible. \$\endgroup\$Jonathon Anderson– Jonathon Anderson2017年10月24日 14:47:15 +00:00Commented Oct 24, 2017 at 14:47
How about the code below?
Note: this assumes that you are actually passing the same object into the arrays, as you are doing here. If the arrays are being populated on the fly, and each will have it's own instances of objects with the same ID, then this method won't work.
# create test cases
$a = [PsCustomObject]@{Id='AAA';Name='Dev' ;DEV='';TEST='';PROD=''}
$b = [PsCustomObject]@{Id='BBB';Name='Dev/Test' ;DEV='';TEST='';PROD=''}
$c = [PsCustomObject]@{Id='CCC';Name='Dev/Prod' ;DEV='';TEST='';PROD=''}
$d = [PsCustomObject]@{Id='DDD';Name='Dev/Test/Prod';DEV='';TEST='';PROD=''}
$e = [PsCustomObject]@{Id='EEE';Name='Test' ;DEV='';TEST='';PROD=''}
$f = [PsCustomObject]@{Id='FFF';Name='Test/Prod' ;DEV='';TEST='';PROD=''}
$g = [PsCustomObject]@{Id='GGG';Name='Prod' ;DEV='';TEST='';PROD=''}
# add to arrays
$dev=@()
$dev += $a
$dev += $b
$dev += $c
$dev += $d
$test=@()
$test += $b
$test += $d
$test += $e
$test += $f
$prod=@()
$prod+=$c
$prod+=$d
$prod+=$f
$prod+=$g
# array to contain the results
$result=@()
# process dev list; decorate w/ additional properties
$dev | % {
$_.DEV="Y"
# add to results
$result += $_
}
# process test list
$test | % {
$_.TEST="Y"
# add to results
if (!$result.Contains($_)){
$result += $_
}
}
# process test list
$prod | % {
$_.PROD="Y"
# add to results
if (!$result.Contains($_)){
$result += $_
}
}
# display the results
$result | format-table -AutoSize
Measure-Command
output from yours:
Days : 0
Hours : 0
Minutes : 0
Seconds : 0
Milliseconds : 38
Ticks : 388914
TotalDays : 4.50131944444444E-07
TotalHours : 1.08031666666667E-05
TotalMinutes : 0.00064819
TotalSeconds : 0.0388914
TotalMilliseconds : 38.8914
Measure-Command
output from mine:
Days : 0
Hours : 0
Minutes : 0
Seconds : 0
Milliseconds : 16
Ticks : 167563
TotalDays : 1.93938657407407E-07
TotalHours : 4.65452777777778E-06
TotalMinutes : 0.000279271666666667
TotalSeconds : 0.0167563
TotalMilliseconds : 16.7563
- Generate arrays via foreach instead of
+=
which recreates the entire array each time - Declare arrays using simplified syntax: just a list of comma-separated elements
$dev = $a, $b, $c, $d
$test = $b, $d, $e, $f
$prod = $c, $d, $f, $g
$result = @(
foreach ($_ in $dev) { Add-Member 'DEV' 'Y' -InputObject $_ -PassThru }
foreach ($_ in $test) { Add-Member 'TEST' 'Y' -InputObject $_ -PassThru }
foreach ($_ in $prod) { Add-Member 'PROD' 'Y' -InputObject $_ -PassThru }
) | Group Id | ForEach { $_.Group[0] }
# help Format-Table recognize all added properties
Add-Member ([ordered]@{
DEV = $result[0].DEV
TEST = $result[0].TEST
PROD = $result[0].PROD
}) -Force -InputObject $result[0]
$result | ft -auto
While the above code is approximately 2 times faster than the original one, it's possible to make it 5 times faster in PowerShell 5 and newer by adding properties directly via .NET method so the second block would become as follows:
$addedId = @{}
$result = Sort Id -input @(
foreach ($sourceName in 'DEV', 'TEST', 'PROD') {
$source = Get-Variable $sourceName -value
$prop = [PSNoteProperty]::new($sourceName, 'Y')
foreach ($item in $source) {
$item.PSObject.Properties.Add($prop)
if (!$addedId[$item.Id]) {
$addedId[$item.Id] = $true
$item
}
}
}
)