2
\$\begingroup\$

I have come up with the following method to remove sensitive information with the help of the garbage collector:

def wipe_sensitive_info(d, keys):
 """
 In Python we are not able to overwrite memory areas. What we do is:
 - remove references to the specified objects, in this case keys in a dictionary
 - immediately call garbage collection
 """
 # TODO: the following points are pending
 # - this relies on the gc to remove the objects from memory (de-allocate memory).
 # - the memory itself is not guaranteed to be overwritten.
 # - this will only deallocate objects which are not being referred somewhere else (ref count 0 after del)
 for key in keys:
 del d[key]
 logger.info('Garbage collected {} objects'.format(gc.collect()))

This would be called from a django app as follows:

wipe_sensitive_info(request.data, ['password'])

Any ideas or comments on how to improve this implementation?

200_success
145k22 gold badges190 silver badges478 bronze badges
asked Apr 9, 2019 at 10:58
\$\endgroup\$
3
  • \$\begingroup\$ Please provide more details about this "sensitive information". Are they strings? Where and how do you use them? \$\endgroup\$ Commented Apr 9, 2019 at 14:00
  • \$\begingroup\$ What is your motivation for doing this (i.e. what is your threat model)? Do you want to prevent accidental logging? Access from a hostile plugin? A hacker who might somehow examine a memory dump? \$\endgroup\$ Commented Apr 9, 2019 at 14:02
  • \$\begingroup\$ @200_success The attack vector is a memory dump \$\endgroup\$ Commented Apr 9, 2019 at 16:00

1 Answer 1

2
\$\begingroup\$

One advantage of the logger module is that you can pass it strings and objects to log at some level, and the string formatting is only performed if it is actually needed (i.e. the logging level is low enough). Also, having the gc.collect() actually hides it a lot, I was looking for it for some time. So I would do this instead:

def wipe_sensitive_info(d, keys):
 """
 In Python we are not able to overwrite memory areas. What we do is:
 - remove references to the specified objects, in this case keys in a dictionary
 - immediately call garbage collection
 """
 # TODO: the following points are pending
 # - this relies on the gc to remove the objects from memory (de-allocate memory).
 # - the memory itself is not guaranteed to be overwritten.
 # - this will only deallocate objects which are not being referred somewhere else (ref count 0 after del)
 for key in keys:
 del d[key]
 n_garbage_collected = gc.collect()
 logger.info("Garbage collected %s objects", n_garbage_collected)

You could add a check for objects which have reference counts left:

import sys
...
for key in keys:
 if sys.getrefcount(d[key]) > 2:
 # The count returned is generally one higher than you might expect,
 # because it includes the (temporary) reference as an argument to
 # getrefcount().
 logger.debug("Object potentially has references left: %s", key)
 del d[key]

Note that this might give you some false positives for interned objects, like small integers or strings. Case in point, in an interactive session, sys.getrefcount("s") just gave me 1071.

answered Apr 9, 2019 at 17:17
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.