5
\$\begingroup\$

I'm trying to implement pbkdf2 in Python 2.7/3.4 without needing the pbkdf2 module. (I'm also trying to avoid using classes.)

Any insight would be appreciated:

from binascii import hexlify, unhexlify
import hmac, struct, hashlib, sys
is_python2 = True if sys.version_info.major == 2 else False
def pbkdf_two(passwd, salt, iters=2048, keylen=64, digestmod=hashlib.sha512):
 """
 >>> hexlify(pbkdf_two(b'All n-entities must communicate with other n-entities via n-1 entiteeheehees', unhexlify('1234567878563412'), 500, 16, hashlib.sha1))
 '6a8970bf68c92caea84a8df285108586'
 """
 dgsz = digestmod().digest_size if callable(digestmod) else digestmod.digest_size
 if keylen is None: keylen = dgsz
 # Helper function which copies each iteration for h, where h is an hmac seeded with password
 def pbhelper(h, salt, itercount, blocksize):
 def prf(h, data):
 hm = h.copy()
 hm.update(data)
 return hm.digest()
 U = prf(h, salt + struct.pack('>i', blocksize))
 T = U
 for j in range(2, itercount+1):
 U = prf(h, U)
 T = "".join([chr( ord(x) ^ ord(y) ) for (x, y) in zip( T, U )]) \
 if is_python2 else bytes([x ^ y for (x, y) in zip(T, U)]) # XORing
 return T
 L = int(keylen/dgsz) # L - number of output blocks to produce
 if keylen % dgsz != 0: L += 1
 h = hmac.new(key=passwd, msg=None, digestmod=digestmod )
 T = b""
 for i in range(1, L+1):
 T += pbhelper(h, salt, iters, i)
 return T[:keylen]
Peilonrayz
44.4k7 gold badges80 silver badges157 bronze badges
asked Apr 21, 2015 at 11:01
\$\endgroup\$
3
  • \$\begingroup\$ Hi and welcome to Code Review! Just to clarify: Does this work as intended in its current state? \$\endgroup\$ Commented Apr 21, 2015 at 11:17
  • 2
    \$\begingroup\$ It does. It returns the correct data for the test (commented code). I am running this through iPython (2.7). Being quite new to coding, I've not tested edge cases, but it works in conjunction with pybitcointools \$\endgroup\$ Commented Apr 21, 2015 at 11:27
  • 1
    \$\begingroup\$ That's good to hear and congratulations on writing a good first question. For future questions you might wish to reconsider the wording of "I'm trying to", which can make it sound like you've not successfully managed to get your code working. To be clear your wording is fine, but could reduce confusion in the future if you went with something like "I've implemented a..." \$\endgroup\$ Commented Apr 21, 2015 at 11:51

1 Answer 1

5
\$\begingroup\$

1. Review

  1. The docstring doesn't explain what the function does, or how to call it, or what it returns. There's just a doctest.

  2. If the code were reformatted to fit in 79 columns, as recommended by the Python style guide (PEP8), then we wouldn't have to scroll it horizontally to read it here at Code Review.

  3. hexlify and unhexlify are only used by the doctest, so they could be included there.

  4. sys is not used, so the import could be omitted.

  5. The key derivation function is called PBKDF2, so I think the function would be better named pbkdf2, not pbkdf_two.

  6. I don't think that having default values for arguments to this function is a good idea. The caller needs to think about these values, not rely on the defaults. Note that RFC 2898 doesn't specify any particular values.

  7. We're not running out of letters, so why not password instead of passwd and digest_size instead of dgsz?

  8. The helper function pbhelper is called from only one place in the code, so there's nothing to be gained by making it into a local function.

  9. The fourth argument to pbhelper is named blocksize, but this is very misleading: it's actually the (1-based) index of the block.

  10. The iteration variable j is not used in the body of the loop. It's conventional to name such a variable _.

  11. Since j is not used, it doesn't matter what values it takes, and so it's simpler to use range(iters-1) instead of range(2, iters+1).

  12. Since h is always the same, there's no need for it to be a parameter to the prf function.

  13. The msg argument to hmac.new defaults to None so there's no need to specify it.

  14. If the code used bytearray instead of bytes then it would be portable between Python 2 and 3 without needing a version test.

  15. Instead of:

     L = int(keylen/dgsz) # L - number of output blocks to produce
     if keylen % dgsz != 0: L += 1
    

the number of blocks can be computed like this, using the floor division operator //:

 L = (keylen + dgsz - 1) // dgsz
  1. But even more simply, why not just iterate until the result is long enough? That way you wouldn't have to compute digest_size, and it would have the advantage that in Python 3.4 or later, the caller could pass in the name of the digest algorithm, just as for hmac.new.

2. Revised code

import hmac
import struct
def pbkdf2(password, salt, iters, keylen, digestmod):
 """Run the PBKDF2 (Password-Based Key Derivation Function 2) algorithm
 and return the derived key. The arguments are:
 password (bytes or bytearray) -- the input password
 salt (bytes or bytearray) -- a cryptographic salt
 iters (int) -- number of iterations
 keylen (int) -- length of key to derive
 digestmod -- a cryptographic hash function: either a module
 supporting PEP 247, a hashlib constructor, or (in Python 3.4
 or later) the name of a hash function.
 For example:
 >>> import hashlib
 >>> from binascii import hexlify, unhexlify
 >>> password = b'Squeamish Ossifrage'
 >>> salt = unhexlify(b'1234567878563412')
 >>> hexlify(pbkdf2(password, salt, 500, 16, hashlib.sha1))
 b'9e8f1072bdf5ef042bd988c7da83e43b'
 """
 h = hmac.new(password, digestmod=digestmod)
 def prf(data):
 hm = h.copy()
 hm.update(data)
 return bytearray(hm.digest())
 key = bytearray()
 i = 1
 while len(key) < keylen:
 T = U = prf(salt + struct.pack('>i', i))
 for _ in range(iters - 1):
 U = prf(U)
 T = bytearray(x ^ y for x, y in zip(T, U))
 key += T
 i += 1
 return key[:keylen]
answered Apr 21, 2015 at 18:11
\$\endgroup\$
2
  • \$\begingroup\$ The iteration variable j is not used in the body of the loop. It's conventional to name such a variable _ Without going too much off topic, why _? I'm aware the console uses _ as the previous result, but am struggling to see the connection \$\endgroup\$ Commented Apr 22, 2015 at 5:42
  • 2
    \$\begingroup\$ @WizardOfOzzie It's just a convention. The two uses of _ are not connected. \$\endgroup\$ Commented Apr 22, 2015 at 9:37

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.