The docstring doesn't explain what the function does, or how to call it, or what it returns. There's just a doctest.
If the code were reformatted to fit in 79 columns, as recommended by the Python style guide (PEP8), then we wouldn't have to scroll it horizontally to read it here at Code Review.
hexlify
andunhexlify
are only used by the doctest, so they could be included there.sys
is not used, so the import could be omitted.The key derivation function is called PBKDF2, so I think the function would be better named
pbkdf2
, notpbkdf_two
.I don't think that having default values for arguments to this function is a good idea. The caller needs to think about these values, not rely on the defaults. Note that RFC 2898 RFC 2898 doesn't specify any particular values.
We're not running out of letters, so why not
password
instead ofpasswd
anddigest_size
instead ofdgsz
?The helper function
pbhelper
is called from only one place in the code, so there's nothing to be gained by making it into a local function.The fourth argument to
pbhelper
is namedblocksize
, but this is very misleading: it's actually the (1-based) index of the block.The iteration variable
j
is not used in the body of the loop. It's conventional to name such a variable_
.Since
j
is not used, it doesn't matter what values it takes, and so it's simpler to userange(iters-1)
instead ofrange(2, iters+1)
.Since
h
is always the same, there's no need for it to be a parameter to theprf
function.The
msg
argument tohmac.new
defaults toNone
so there's no need to specify it.If the code used
bytearray
instead ofbytes
then it would be portable between Python 2 and 3 without needing a version test.Instead of:
L = int(keylen/dgsz) # L - number of output blocks to produce if keylen % dgsz != 0: L += 1
The docstring doesn't explain what the function does, or how to call it, or what it returns. There's just a doctest.
If the code were reformatted to fit in 79 columns, as recommended by the Python style guide (PEP8), then we wouldn't have to scroll it horizontally to read it here at Code Review.
hexlify
andunhexlify
are only used by the doctest, so they could be included there.sys
is not used, so the import could be omitted.The key derivation function is called PBKDF2, so I think the function would be better named
pbkdf2
, notpbkdf_two
.I don't think that having default values for arguments to this function is a good idea. The caller needs to think about these values, not rely on the defaults. Note that RFC 2898 doesn't specify any particular values.
We're not running out of letters, so why not
password
instead ofpasswd
anddigest_size
instead ofdgsz
?The helper function
pbhelper
is called from only one place in the code, so there's nothing to be gained by making it into a local function.The fourth argument to
pbhelper
is namedblocksize
, but this is very misleading: it's actually the (1-based) index of the block.The iteration variable
j
is not used in the body of the loop. It's conventional to name such a variable_
.Since
j
is not used, it doesn't matter what values it takes, and so it's simpler to userange(iters-1)
instead ofrange(2, iters+1)
.Since
h
is always the same, there's no need for it to be a parameter to theprf
function.The
msg
argument tohmac.new
defaults toNone
so there's no need to specify it.If the code used
bytearray
instead ofbytes
then it would be portable between Python 2 and 3 without needing a version test.Instead of:
L = int(keylen/dgsz) # L - number of output blocks to produce if keylen % dgsz != 0: L += 1
The docstring doesn't explain what the function does, or how to call it, or what it returns. There's just a doctest.
If the code were reformatted to fit in 79 columns, as recommended by the Python style guide (PEP8), then we wouldn't have to scroll it horizontally to read it here at Code Review.
hexlify
andunhexlify
are only used by the doctest, so they could be included there.sys
is not used, so the import could be omitted.The key derivation function is called PBKDF2, so I think the function would be better named
pbkdf2
, notpbkdf_two
.I don't think that having default values for arguments to this function is a good idea. The caller needs to think about these values, not rely on the defaults. Note that RFC 2898 doesn't specify any particular values.
We're not running out of letters, so why not
password
instead ofpasswd
anddigest_size
instead ofdgsz
?The helper function
pbhelper
is called from only one place in the code, so there's nothing to be gained by making it into a local function.The fourth argument to
pbhelper
is namedblocksize
, but this is very misleading: it's actually the (1-based) index of the block.The iteration variable
j
is not used in the body of the loop. It's conventional to name such a variable_
.Since
j
is not used, it doesn't matter what values it takes, and so it's simpler to userange(iters-1)
instead ofrange(2, iters+1)
.Since
h
is always the same, there's no need for it to be a parameter to theprf
function.The
msg
argument tohmac.new
defaults toNone
so there's no need to specify it.If the code used
bytearray
instead ofbytes
then it would be portable between Python 2 and 3 without needing a version test.Instead of:
L = int(keylen/dgsz) # L - number of output blocks to produce if keylen % dgsz != 0: L += 1
1. Review
The docstring doesn't explain what the function does, or how to call it, or what it returns. There's just a doctest.
If the code were reformatted to fit in 79 columns, as recommended by the Python style guide (PEP8), then we wouldn't have to scroll it horizontally to read it here at Code Review.
hexlify
andunhexlify
are only used by the doctest, so they could be included there.sys
is not used, so the import could be omitted.The key derivation function is called PBKDF2, so I think the function would be better named
pbkdf2
, notpbkdf_two
.I don't think that having default values for arguments to this function is a good idea. The caller needs to think about these values, not rely on the defaults. Note that RFC 2898 doesn't specify any particular values.
We're not running out of letters, so why not
password
instead ofpasswd
anddigest_size
instead ofdgsz
?The helper function
pbhelper
is called from only one place in the code, so there's nothing to be gained by making it into a local function.The fourth argument to
pbhelper
is namedblocksize
, but this is very misleading: it's actually the (1-based) index of the block.The iteration variable
j
is not used in the body of the loop. It's conventional to name such a variable_
.Since
j
is not used, it doesn't matter what values it takes, and so it's simpler to userange(iters-1)
instead ofrange(2, iters+1)
.Since
h
is always the same, there's no need for it to be a parameter to theprf
function.The
msg
argument tohmac.new
defaults toNone
so there's no need to specify it.If the code used
bytearray
instead ofbytes
then it would be portable between Python 2 and 3 without needing a version test.Instead of:
L = int(keylen/dgsz) # L - number of output blocks to produce if keylen % dgsz != 0: L += 1
the number of blocks can be computed like this, using the floor division operator //
:
L = (keylen + dgsz - 1) // dgsz
- But even more simply, why not just iterate until the result is long enough? That way you wouldn't have to compute
digest_size
, and it would have the advantage that in Python 3.4 or later, the caller could pass in the name of the digest algorithm, just as forhmac.new
.
2. Revised code
import hmac
import struct
def pbkdf2(password, salt, iters, keylen, digestmod):
"""Run the PBKDF2 (Password-Based Key Derivation Function 2) algorithm
and return the derived key. The arguments are:
password (bytes or bytearray) -- the input password
salt (bytes or bytearray) -- a cryptographic salt
iters (int) -- number of iterations
keylen (int) -- length of key to derive
digestmod -- a cryptographic hash function: either a module
supporting PEP 247, a hashlib constructor, or (in Python 3.4
or later) the name of a hash function.
For example:
>>> import hashlib
>>> from binascii import hexlify, unhexlify
>>> password = b'Squeamish Ossifrage'
>>> salt = unhexlify(b'1234567878563412')
>>> hexlify(pbkdf2(password, salt, 500, 16, hashlib.sha1))
b'9e8f1072bdf5ef042bd988c7da83e43b'
"""
h = hmac.new(password, digestmod=digestmod)
def prf(data):
hm = h.copy()
hm.update(data)
return bytearray(hm.digest())
key = bytearray()
i = 1
while len(key) < keylen:
T = U = prf(salt + struct.pack('>i', i))
for _ in range(iters - 1):
U = prf(U)
T = bytearray(x ^ y for x, y in zip(T, U))
key += T
i += 1
return key[:keylen]