- Python 81.3%
- Gnuplot 17.8%
- Shell 0.9%
| .gitignore | Move output to output folder | |
| all.gnuplot | adjust xrange for aes vs rsa | |
| benchmark.py | Move output to output folder | |
| README.md | submit | |
| render.sh | pandoc | |
| report.pdf | submit | |
| source.tar.gz | submit | |
| title | author | geometry | output |
|---|---|---|---|
| Security and Privacy - Assignment 1 |
Diogo Cordeiro
Hugo Sales
|
margin=2cm | pdf_document |
0. Test Bench
Operating System: Debian GNU/Linux 11.3 (bookworm/sid)
Linux Kernel: 5.16.0-6-amd64
CPU: Intel(R) Core(TM) i5-1135G7 CPU @ 2.40GHz
RAM: 8x1GiB LPDDR4 Synchronous 4267 MHz (0,2 ns)
Python: 3.9.12 with cryptography 36.0.2
The machine was started in text mode with nothing but the essential being executed.
1. Data Collection
- Assets of the specified sizes were built using randomly selected ASCII chars
- Using Python module
timeitwe have ran each function 10 times with 3 repetitions
By running each experiment 10 times, and repeating it 3 times, we believe we have amortized any outlier execution and retrieved a statistically significant average duration for each function.
Understanding the benchmarking code
timeit.template = """
def inner(_it, _timer{init}):
{setup} _t0 = _timer()
for _i in _it:
retval = {stmt} _t1 = _timer()
return _t1 - _t0, retval
"""
This changes the code template timeit uses internally, so that the results of
{stmt}, which is the function under benchmarking, gets returned, which we use
to capture the encrypted output, to later decrypt it.
def benchmark_step(fd, alg_name, size, func):
# Timer.repeat returns [(time, result)], unzip it to ([time], [result])
execution_time, result = zip(*timeit.Timer(func).repeat(
repeat=num_repetitions,
number=num_runs
))
average_duration = sum(execution_time)/(num_repetitions*num_runs)
table_results.append([alg_name, size, average_duration])
fd.write(f'{size}{average_duration}\n')
return result[0]
We compute the average duration the way we do because timeit gives us a list
with the time summation of the num_runs per experiment/repeat. Hence, to have
an average, we divide the summation of the list with its length and the number
of runs.
An important nuance we have noticed is that cryptography.hazmat has a
cold start problem and seemingly doesn't feature any bootstrapping utility
function. We work around this by our use of small tests that ensure the
encryption is reversible:
def test_aes():
aes_key = os.urandom(256 // 8) # 256 bits
assert b'CC2009 Assignment #1' == aes(aes_key, aes(
aes_key, b'CC2009 Assignment #1', mode = Operation.encrypt
), mode = Operation.decrypt)
test_aes()
2. Benchmarking and Analysis
Plot of time in function of Size of all algorithms considered{width=14cm}
\pagebreak
RSA
RSA was developed in 1977 and became a general-purpose approach to public-key encryption. Meaning that the encryption key is public and distinct from the decryption key.
def rsa(pub_key, priv_key, text, mode = Operation.encrypt):
pad = padding.OAEP(
mgf = padding.MGF1(algorithm = hashes.SHA256()),
algorithm = hashes.SHA256(),
label = None
)
if mode == Operation.encrypt:
return pub_key.encrypt(text, pad)
else:
return priv_key.decrypt(text, pad)
RSA Encrypt compared with RSA Decrypt{width=14cm}
RSA encryption is much slower than decryption because the public exponent chosen (65537) is a small value (which is not a security issue).
Regarding decryption, even though we have used a 2048-bit key length (in reality one should use at least 4096), it is noticeably slow. This is because RSA relies in computation with large numbers and expensive operations such as exponentiation.
Note that the plots are constants because the size is smaller than the key and
the input is padded, the largest input size used was 2^{7} = 128.
\pagebreak
AES
AES was developed in 1998 and is a symmetric-key algorithm, meaning the same key is used for both encrypting and decrypting the data.
def aes(key, text, mode = Operation.encrypt):
cipher = Cipher(algorithms.AES(key), modes.ECB())
if mode == Operation.encrypt:
encryptor = cipher.encryptor()
padded_length = (len(text)//16 + 1) * 16
pad_char = hex(padded_length - len(text))[2:].encode('utf-8') if len(text) % 16 != 0
else b'0円'
return encryptor.update(text.ljust(padded_length, pad_char)) + encryptor.finalize()
else:
decryptor = cipher.decryptor()
res = decryptor.update(text) + decryptor.finalize()
padding = int(chr(res[-1]), 16) if res[-1] != 0 else 16
return res[:-padding]
Operation.encrypt
In Operation.encrypt mode, we must pad the input text to a multiple of the
block size (16). To this end, padded_length is calculated such that the number
is rounded to the next multiple of 16, unless it's already a multiple of 16, in
which case an extra 16 bytes are added. This is beneficial later.
We use PKCS#7 padding, by padding the remaining block with the hex
representation of the number of padding bytes used. This works specifically
because the block size is 16, so a single hex digit can represent the length.
If the text length is a multiple of the block size, we need to add a block
filled with 0x0, since we need to always have padding, for the reverse direction.
For instance,
b'CC2009 Assignment #1'
gets padded to
b'CC2009 Assignment #1cccccccccccc'.
And
b'0123456789abcdef'
results in
b'0123456789abcdef\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
Operation.decrypt
After decryption, we must remove padding, so we look at the last byte. If it's zero, we drop a whole block size, otherwise convert the byte to it's ASCII representation and the corresponding hexadecimal number.
\pagebreak
Comparing AES and RSA{width=14cm}
RSA decryption is much slower than AES. In this plot we've had to specify
the xrange to the small amounts of data that were tested with RSA.
Before we can explain the differences between both algorithms, we must first understand their differences and purposes.
In general, asymmetric cryptography is significantly slower than symmetric cryptography.
This higher complexity of the RSA algorithm (when compared to AES) is necessary due to the fact that one of the keys in RSA is public. In order to maintain security, asymmetric encryption must make it too difficult of cracking the public key and discover the private key.
SHA-256
SHA-2 (Secure Hash Algorithm 2) were first introduced in 2001, they are a one-way compression function.
def sha256(text):
digest = hashes.Hash(hashes.SHA256())
digest.update(text)
return digest.finalize()
Comparing AES and SHA-256{width=14cm}
Before we go in more depth regarding the differences between AES and SHA-256, it's crucial to refer that we are considering AES in ECB (Electronic CodeBook) mode, which is the simplest and divides the message into blocks, encrypting each separately (this method lacks diffusion).
With SHA one is able to compute a hash (digest) of the input, and this is a one-way process. One can't take a digest and recover the original document.
AES is used to encrypt data. Nonetheless, we can see that AES ECB encrypt is faster than SHA-256. Which is not necessarily an advantage, part of the interest in an hash is not to be reversible and, although it is relevant to be a fast function, for applications where speed is more essential, more often than not, sha-256 is sufficiently fast, while in other settings one can even use sha-512 instead.