someonewithpc/CC2009

Code Issues Pull requests Projects Releases Wiki Activity

No description

Python 81.3%
Gnuplot 17.8%
Shell 0.9%

Find a file

Diogo Peralta Cordeiro b0cebd7056 submit		2022年04月08日 22:16:56 +01:00
.gitignore	Move output to output folder	2022年04月08日 06:10:58 +01:00
all.gnuplot	adjust xrange for aes vs rsa	2022年04月08日 21:50:14 +01:00
benchmark.py	Move output to output folder	2022年04月08日 06:10:58 +01:00
README.md	submit	2022年04月08日 22:16:56 +01:00
render.sh	pandoc	2022年04月07日 18:42:07 +01:00
report.pdf	submit	2022年04月08日 22:16:56 +01:00
source.tar.gz	submit	2022年04月08日 22:16:56 +01:00

README.md

title	author	geometry	output
Security and Privacy - Assignment 1	Diogo Cordeiro Hugo Sales	margin=2cm	pdf_document

0. Test Bench

Operating System: Debian GNU/Linux 11.3 (bookworm/sid)

Linux Kernel: 5.16.0-6-amd64

CPU: Intel(R) Core(TM) i5-1135G7 CPU @ 2.40GHz

RAM: 8x1GiB LPDDR4 Synchronous 4267 MHz (0,2 ns)

Python: 3.9.12 with cryptography 36.0.2

The machine was started in text mode with nothing but the essential being executed.

1. Data Collection

Assets of the specified sizes were built using randomly selected ASCII chars
Using Python module timeit we have ran each function 10 times with 3 repetitions

By running each experiment 10 times, and repeating it 3 times, we believe we have amortized any outlier execution and retrieved a statistically significant average duration for each function.

Understanding the benchmarking code

timeit.template = """
def inner(_it, _timer{init}):
{setup} _t0 = _timer()
 for _i in _it:
 retval = {stmt} _t1 = _timer()
 return _t1 - _t0, retval
"""

This changes the code template timeit uses internally, so that the results of {stmt}, which is the function under benchmarking, gets returned, which we use to capture the encrypted output, to later decrypt it.

def benchmark_step(fd, alg_name, size, func):
 # Timer.repeat returns [(time, result)], unzip it to ([time], [result])
 execution_time, result = zip(*timeit.Timer(func).repeat(
 repeat=num_repetitions,
 number=num_runs
 ))
 average_duration = sum(execution_time)/(num_repetitions*num_runs)
 table_results.append([alg_name, size, average_duration])
 fd.write(f'{size}{average_duration}\n')
 return result[0]

We compute the average duration the way we do because timeit gives us a list with the time summation of the num_runs per experiment/repeat. Hence, to have an average, we divide the summation of the list with its length and the number of runs.

An important nuance we have noticed is that cryptography.hazmat has a cold start problem and seemingly doesn't feature any bootstrapping utility function. We work around this by our use of small tests that ensure the encryption is reversible:

def test_aes():
 aes_key = os.urandom(256 // 8) # 256 bits
 assert b'CC2009 Assignment #1' == aes(aes_key, aes(
 aes_key, b'CC2009 Assignment #1', mode = Operation.encrypt
 ), mode = Operation.decrypt)
test_aes()

2. Benchmarking and Analysis

Plot of time in function of Size of all algorithms considered{width=14cm}

\pagebreak

RSA

RSA was developed in 1977 and became a general-purpose approach to public-key encryption. Meaning that the encryption key is public and distinct from the decryption key.

def rsa(pub_key, priv_key, text, mode = Operation.encrypt):
 pad = padding.OAEP(
 mgf = padding.MGF1(algorithm = hashes.SHA256()),
 algorithm = hashes.SHA256(),
 label = None
 )
 if mode == Operation.encrypt:
 return pub_key.encrypt(text, pad)
 else:
 return priv_key.decrypt(text, pad)

RSA Encrypt compared with RSA Decrypt{width=14cm}

RSA encryption is much slower than decryption because the public exponent chosen (65537) is a small value (which is not a security issue).

Regarding decryption, even though we have used a 2048-bit key length (in reality one should use at least 4096), it is noticeably slow. This is because RSA relies in computation with large numbers and expensive operations such as exponentiation.

Note that the plots are constants because the size is smaller than the key and the input is padded, the largest input size used was 2^{7} = 128.

\pagebreak

AES

AES was developed in 1998 and is a symmetric-key algorithm, meaning the same key is used for both encrypting and decrypting the data.

def aes(key, text, mode = Operation.encrypt):
 cipher = Cipher(algorithms.AES(key), modes.ECB())
 if mode == Operation.encrypt:
 encryptor = cipher.encryptor()
 padded_length = (len(text)//16 + 1) * 16
 pad_char = hex(padded_length - len(text))[2:].encode('utf-8') if len(text) % 16 != 0
 else b'0円'
 return encryptor.update(text.ljust(padded_length, pad_char)) + encryptor.finalize()
 else:
 decryptor = cipher.decryptor()
 res = decryptor.update(text) + decryptor.finalize()
 padding = int(chr(res[-1]), 16) if res[-1] != 0 else 16
 return res[:-padding]

`Operation.encrypt`

In Operation.encrypt mode, we must pad the input text to a multiple of the block size (16). To this end, padded_length is calculated such that the number is rounded to the next multiple of 16, unless it's already a multiple of 16, in which case an extra 16 bytes are added. This is beneficial later.

We use PKCS#7 padding, by padding the remaining block with the hex representation of the number of padding bytes used. This works specifically because the block size is 16, so a single hex digit can represent the length. If the text length is a multiple of the block size, we need to add a block filled with 0x0, since we need to always have padding, for the reverse direction.

For instance,

b'CC2009 Assignment #1'

gets padded to

b'CC2009 Assignment #1cccccccccccc'.

And

b'0123456789abcdef'

results in

b'0123456789abcdef\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

`Operation.decrypt`

After decryption, we must remove padding, so we look at the last byte. If it's zero, we drop a whole block size, otherwise convert the byte to it's ASCII representation and the corresponding hexadecimal number.

\pagebreak

Comparing AES and RSA{width=14cm}

RSA decryption is much slower than AES. In this plot we've had to specify the xrange to the small amounts of data that were tested with RSA.

Before we can explain the differences between both algorithms, we must first understand their differences and purposes.

In general, asymmetric cryptography is significantly slower than symmetric cryptography.

This higher complexity of the RSA algorithm (when compared to AES) is necessary due to the fact that one of the keys in RSA is public. In order to maintain security, asymmetric encryption must make it too difficult of cracking the public key and discover the private key.

SHA-256

SHA-2 (Secure Hash Algorithm 2) were first introduced in 2001, they are a one-way compression function.

def sha256(text):
 digest = hashes.Hash(hashes.SHA256())
 digest.update(text)
 return digest.finalize()

Comparing AES and SHA-256{width=14cm}

Before we go in more depth regarding the differences between AES and SHA-256, it's crucial to refer that we are considering AES in ECB (Electronic CodeBook) mode, which is the simplest and divides the message into blocks, encrypting each separately (this method lacks diffusion).

With SHA one is able to compute a hash (digest) of the input, and this is a one-way process. One can't take a digest and recover the original document.

AES is used to encrypt data. Nonetheless, we can see that AES ECB encrypt is faster than SHA-256. Which is not necessarily an advantage, part of the interest in an hash is not to be reversible and, although it is relevant to be a fast function, for applications where speed is more essential, more often than not, sha-256 is sufficiently fast, while in other settings one can even use sha-512 instead.