I'm moving some files from a Linux machine to a Windows machine over a low-speed, possibly buggy experimental communications channel that I want to test. One of the tests is to transfer large numbers of large and small files and verify their cryptographic hashes at the receiving end. On the Linux side, we're using md5sum
to generate file hashes like so:
md5sum * > files.md5
Then the files are transmitted from the Linux machine to the Windows 10 machine. What I'd like to do next is to verify the hashes on the plain-vanilla Windows machine (no Cygwin installed). So to mimic the default operation of md5sum -c files.md5
which would go through, line by line and verify each md5 checksum, I've written this Powershell script. I'm a lot more at home in bash than in Powershell, so thought I might benefit from a review.
param (
[Parameter(Mandatory=$true)][string]$infile
)
$basedir = Split-Path -Parent $infile
$badcount = 0
foreach ($line in [System.IO.File]::ReadLines("$infile")) {
$sum, $file = $line.split(' ')
$fullfile = "$basedir\$file"
$filehash = Get-FileHash -Algorithm MD5 $fullfile
if ($sum -eq $filehash.Hash) {
Write-Host $file ": OK"
} else {
Write-Host $file ": FAILED"
$badcount++
}
}
if ($badcount -gt "0") {
Write-Host "WARNING:" $badcount "computed checksums did NOT match"
}
1 Answer 1
here are a few changes i would make. [grin] the ideas ...
- use
Get-Content
instead ofReadLines()
the speed difference is not large unless you are dealing with a very large number of files. go with the standard cmdlets unless there is a meaningful benefit from doing otherwise. - test to see if the file exists
- build a
[PSCustomObject]
to hold the resulting items that you want - keep those PSCOs in a collection
- view your hash failure items after the full test ends
what it does ...
- sets the constants
- builds a test file to work with
remove the entire#region/#endregion
block when you are ready to use your own data. - reads in the hash list file
- iterates thru the resulting array
- splits out the file name and hash value
- builds the full file name to check
- tests to see if that file exists
- if YES, gets the file hash and saves it
- if NO, sets the file hash $Var to
'__N/A__'
- builds a PSCO with the properties that seem useful
- sends that to the
$Result
collection - gets the hash failures from the collection and displays them
if all you want it the count, wrap that all in@()
and add.Count
to the end.
the code ...
$SourceDir = $env:TEMP
$HashFileName = 'FileHashList.txt'
$FullHashFileName = Join-Path -Path $SourceDir -ChildPath $HashFileName
#region >>> make a hash list to compare with
# remove this entire "#region/#endregion" block when ready to work with your real data
$HashList = Get-ChildItem -LiteralPath $SourceDir -Filter '*.log' -File |
ForEach-Object {
'{0} {1}' -f $_.Name, (Get-FileHash -LiteralPath $_.FullName-Algorithm 'MD5').Hash
}
# munge the 1st two hash values
$HashList[0] = $HashList[0] -replace '.{5}$', '--BAD'
$HashList[1] = $HashList[1] -replace '.{5}$', '--BAD'
$HashList |
Set-Content -LiteralPath $FullHashFileName
#endregion >>> make a hash list to compare with
$Result = foreach ($Line in (Get-Content -LiteralPath $FullHashFileName))
{
$TestFileName, $Hash = $Line.Split(' ')
$FullTestFileName = Join-Path -Path $SourceDir -ChildPath $TestFileName
if (Test-Path -LiteralPath $FullTestFileName)
{
$THash = (Get-FileHash -LiteralPath $FullTestFileName -Algorithm 'MD5').Hash
}
else
{
$THash = '__N/A__'
}
[PSCustomObject]@{
FileName = $TestFileName
CopyOK = $THash -eq $Hash
OriHash = $Hash
CopyHash = $THash
}
}
$Result.Where({$_.CopyOK -eq $False})
output [with the 1st two hash values deliberately munged] ...
FileName CopyOK OriHash CopyHash
-------- ------ ------- --------
Genre-List_2020年07月07日.log False 7C0C605EA7561B7020CBDAE24D1--BAD 7C0C605EA7561B7020CBDAE24D140E40
Genre-List_2020年07月14日.log False 20F234ACE66B860821CF8F8BD5E--BAD 20F234ACE66B860821CF8F8BD5EC144D
-
\$\begingroup\$ Thanks! I'm definitely going to have to study this. One question I have already is about "go with the standard cmdlets" -- how do I know which are standard and which are not? Is there a list somewhere? \$\endgroup\$Edward– Edward2020年07月22日 21:08:18 +00:00Commented Jul 22, 2020 at 21:08
-
\$\begingroup\$ @Edward - you are most welcome! glad to help a bit. [grin] ///// by
standard cmdlets
i mean the actual powershell cmdlets, not calls to dotnet stuff. you can see the ones that are available to your setup withGet-Command
. the verbs are viewable withGet-Verb
. [grin] \$\endgroup\$Lee_Dailey– Lee_Dailey2020年07月22日 22:03:41 +00:00Commented Jul 22, 2020 at 22:03