I'm trying to implement pbkdf2 in Python 2.7/3.4 without needing the pbkdf2 module. (I'm also trying to avoid using classes.)
Any insight would be appreciated:
from binascii import hexlify, unhexlify
import hmac, struct, hashlib, sys
is_python2 = True if sys.version_info.major == 2 else False
def pbkdf_two(passwd, salt, iters=2048, keylen=64, digestmod=hashlib.sha512):
"""
>>> hexlify(pbkdf_two(b'All n-entities must communicate with other n-entities via n-1 entiteeheehees', unhexlify('1234567878563412'), 500, 16, hashlib.sha1))
'6a8970bf68c92caea84a8df285108586'
"""
dgsz = digestmod().digest_size if callable(digestmod) else digestmod.digest_size
if keylen is None: keylen = dgsz
# Helper function which copies each iteration for h, where h is an hmac seeded with password
def pbhelper(h, salt, itercount, blocksize):
def prf(h, data):
hm = h.copy()
hm.update(data)
return hm.digest()
U = prf(h, salt + struct.pack('>i', blocksize))
T = U
for j in range(2, itercount+1):
U = prf(h, U)
T = "".join([chr( ord(x) ^ ord(y) ) for (x, y) in zip( T, U )]) \
if is_python2 else bytes([x ^ y for (x, y) in zip(T, U)]) # XORing
return T
L = int(keylen/dgsz) # L - number of output blocks to produce
if keylen % dgsz != 0: L += 1
h = hmac.new(key=passwd, msg=None, digestmod=digestmod )
T = b""
for i in range(1, L+1):
T += pbhelper(h, salt, iters, i)
return T[:keylen]
-
\$\begingroup\$ Hi and welcome to Code Review! Just to clarify: Does this work as intended in its current state? \$\endgroup\$Nick Udell– Nick Udell2015年04月21日 11:17:46 +00:00Commented Apr 21, 2015 at 11:17
-
2\$\begingroup\$ It does. It returns the correct data for the test (commented code). I am running this through iPython (2.7). Being quite new to coding, I've not tested edge cases, but it works in conjunction with pybitcointools \$\endgroup\$Wizard Of Ozzie– Wizard Of Ozzie2015年04月21日 11:27:09 +00:00Commented Apr 21, 2015 at 11:27
-
1\$\begingroup\$ That's good to hear and congratulations on writing a good first question. For future questions you might wish to reconsider the wording of "I'm trying to", which can make it sound like you've not successfully managed to get your code working. To be clear your wording is fine, but could reduce confusion in the future if you went with something like "I've implemented a..." \$\endgroup\$Nick Udell– Nick Udell2015年04月21日 11:51:32 +00:00Commented Apr 21, 2015 at 11:51
1 Answer 1
1. Review
The docstring doesn't explain what the function does, or how to call it, or what it returns. There's just a doctest.
If the code were reformatted to fit in 79 columns, as recommended by the Python style guide (PEP8), then we wouldn't have to scroll it horizontally to read it here at Code Review.
hexlify
andunhexlify
are only used by the doctest, so they could be included there.sys
is not used, so the import could be omitted.The key derivation function is called PBKDF2, so I think the function would be better named
pbkdf2
, notpbkdf_two
.I don't think that having default values for arguments to this function is a good idea. The caller needs to think about these values, not rely on the defaults. Note that RFC 2898 doesn't specify any particular values.
We're not running out of letters, so why not
password
instead ofpasswd
anddigest_size
instead ofdgsz
?The helper function
pbhelper
is called from only one place in the code, so there's nothing to be gained by making it into a local function.The fourth argument to
pbhelper
is namedblocksize
, but this is very misleading: it's actually the (1-based) index of the block.The iteration variable
j
is not used in the body of the loop. It's conventional to name such a variable_
.Since
j
is not used, it doesn't matter what values it takes, and so it's simpler to userange(iters-1)
instead ofrange(2, iters+1)
.Since
h
is always the same, there's no need for it to be a parameter to theprf
function.The
msg
argument tohmac.new
defaults toNone
so there's no need to specify it.If the code used
bytearray
instead ofbytes
then it would be portable between Python 2 and 3 without needing a version test.Instead of:
L = int(keylen/dgsz) # L - number of output blocks to produce if keylen % dgsz != 0: L += 1
the number of blocks can be computed like this, using the floor division operator //
:
L = (keylen + dgsz - 1) // dgsz
- But even more simply, why not just iterate until the result is long enough? That way you wouldn't have to compute
digest_size
, and it would have the advantage that in Python 3.4 or later, the caller could pass in the name of the digest algorithm, just as forhmac.new
.
2. Revised code
import hmac
import struct
def pbkdf2(password, salt, iters, keylen, digestmod):
"""Run the PBKDF2 (Password-Based Key Derivation Function 2) algorithm
and return the derived key. The arguments are:
password (bytes or bytearray) -- the input password
salt (bytes or bytearray) -- a cryptographic salt
iters (int) -- number of iterations
keylen (int) -- length of key to derive
digestmod -- a cryptographic hash function: either a module
supporting PEP 247, a hashlib constructor, or (in Python 3.4
or later) the name of a hash function.
For example:
>>> import hashlib
>>> from binascii import hexlify, unhexlify
>>> password = b'Squeamish Ossifrage'
>>> salt = unhexlify(b'1234567878563412')
>>> hexlify(pbkdf2(password, salt, 500, 16, hashlib.sha1))
b'9e8f1072bdf5ef042bd988c7da83e43b'
"""
h = hmac.new(password, digestmod=digestmod)
def prf(data):
hm = h.copy()
hm.update(data)
return bytearray(hm.digest())
key = bytearray()
i = 1
while len(key) < keylen:
T = U = prf(salt + struct.pack('>i', i))
for _ in range(iters - 1):
U = prf(U)
T = bytearray(x ^ y for x, y in zip(T, U))
key += T
i += 1
return key[:keylen]
-
\$\begingroup\$
The iteration variable j is not used in the body of the loop. It's conventional to name such a variable _
Without going too much off topic, why _? I'm aware the console uses _ as the previous result, but am struggling to see the connection \$\endgroup\$Wizard Of Ozzie– Wizard Of Ozzie2015年04月22日 05:42:57 +00:00Commented Apr 22, 2015 at 5:42 -
2\$\begingroup\$ @WizardOfOzzie It's just a convention. The two uses of
_
are not connected. \$\endgroup\$Janne Karila– Janne Karila2015年04月22日 09:37:12 +00:00Commented Apr 22, 2015 at 9:37
Explore related questions
See similar questions with these tags.