Substitution Cipher in C

Question 1

I was directed here from Stack Overflow.

I completed my recent programming assignment for developing a substitution cipher in C. Below is what I came up with after reading many tutorials, googling many questions, watching many videos, etc. but I am happy with the result. I was just wondering if you know of a more efficient way to do it? Is this good code (for a beginner)? Is there something I overlooked? It should be noted that this is CS50 and I was using the CS50 library. If this isn't the place for this type of feedback, please delete! Thanks!

Here is my code:

#include <cs50.h>
#include <ctype.h>
#include <stdio.h>
#include <string.h>
int validate(int c, string v[]); //prototpe validate function
int main(int argc, string argv[])
{
 if (validate(argc, argv) == 0) //run validate for argc and argv and check its response
 {
 printf("Key is valid\n"); //if valid, print message
 }
 else
 {
 return 1; //Return error
 }
 string plaintext = get_string("plaintext: "); //Get user input
 printf("ciphertext: "); //Output the cipher, using following algorithm
 //Loop through each letter of inputed text
 for (int t = 0, len = strlen(plaintext); t < len; t++)
 {
 //If not alphabetic character, print as entered
 if (plaintext[t] < 'A' || (plaintext[t] > 'Z' && plaintext[t] < 'a') || plaintext[t] > 'z')
 {
 printf("%c", plaintext[t]);
 }
 //If alphabetic, encipher the input
 else
 {
 for (int u = 0; u < 26; u++)
 {
 if (plaintext[t] == 65 + u) //check for uppercase alphabetic characters starting with ASCII 65
 {
 //Ensure outputed text maintains uppercase
 char upper = argv[1][u];
 int up = isupper(upper);
 if (up == 0)
 {
 upper = toupper(upper);
 printf("%c", upper);
 }
 if (up != 0)
 {
 printf("%c", upper);
 }
 }
 if (plaintext[t] == 97 + u) //check for lowercase alphabetic characters starting with ASCII 97
 {
 //Ensure the outputed text maintains lowercase
 char lower = argv[1][u];
 int low = islower(lower);
 if (low == 0)
 {
 lower = tolower(lower);
 printf("%c", lower);
 }
 if (low != 0)
 {
 printf("%c", lower);
 }
 }
 }
 }
 }
 printf("\n"); //print newline for clean terminal
 return 0; //Exit
}
//Key Validation function
int validate(int c, string v[])
{
 //Validate that only one Command Line Argument was entered
 if (c != 2) //Check the number if entered commands at execution
 {
 //If more than one, print error message
 printf("Key must be the only Command Line Argument\n");
 return 1; //Return error
 }
 //Validate that Key length is 26 characters
 if (strlen(v[1]) != 26) //Don't forget null 0
 {
 printf("Key must contain exactly 26 characters\n");
 return 1; //Return error
 }
 //Validate that all Key characters are alphabetic
 for (int i = 0, n = strlen(v[1]); i < n; i++)
 {
 //Check each element of the array
 if (isalpha(v[1][i]))
 {
 continue; //Continue if alphabetic
 }
 else
 {
 //if non-alphabetic, print error code
 printf("Key must contain only alphabetic characters\n");
 return 1; //Return error
 }
 }
 //Validate that each Key character is unique
 for (int x = 0, z = strlen(v[1]); x < z; x++)
 {
 //Create a second loop to compare every element to the first before incrementing the first
 for (int y = x + 1; y < z; y++)
 {
 //Cast array elements to int, check if each element equals next element in array
 if (v[1][x] == v[1][y])
 {
 printf("Key must contain exactly 26 unique characters\n");
 return 1; //Return error
 }
 //also check if same character has been used in different case
 if (v[1][x] == v[1][y] + 32)
 {
 printf("Key must contain exactly 26 unique characters\n");
 return 1; //Return error
 }
 if (v[1][x] == v[1][y] - 32)
 {
 printf("Key must contain exactly 26 unique characters\n");
 return 1; //Return error
 }
 }
 }
 return 0; //Key is valid, so return true
}

Question 2

My first suggestion is actually to remove most of the comments. Sometimes comments are useful, but when they are explaining obvious things, they just clutter the code. Also, since validate checks the command line arguments, I would rename it to validate_args and also name the arguments argc and argv.

So assuming we have renamed those variables, let's continue on that function. The check for number of arguments and argument length are good as they are. So we jump to checking for only alpha characters. Your loop is too complicated. It does not need an if statement. Also, we could change the for loop a bit, since we already know that it has correct length.

for (int i = 0; argv[1][i] != '0円'; i++)
{
 if (!isalpha(v[1][i]))
 printf("Key must contain only alphabetic characters\n");
 return 1;
 }
}

The check for uniqueness could be made a lot simpler. I made the array bigger than 26 so that it works if you want to use more characters in future. Also changed the error message because of that.

int arr[256] = {0}; 
for (int i = 0; argv[1][i] != '0円'; i++)
{
 if(arr[argv[1][i]] > 0)
 {
 printf("Key must contain unique characters\n");
 return 1; 
 }
 arr[argv[1][i]]++;
}

You could make it shorter by removing the last line that increments the array and instead write the if statement like this: if(arr[argv[1][i]]++ > 0) but IMHO, that just makes it harder to read.

To clarify, what's happening here is basically something that creates a histogram. You could write it like this instead:

int arr[256] = {0}; 
for (int i = 0; argv[1][i] != '0円'; i++) {
 arr[argv[1][i]]++;
}
// Now arr[n] indicates how many of character n the argument had
for(int i=0; i<sizeof arr; i++) {
 if(arr[i] > 1) {
 printf("Key must contain unique characters\n");
 return 1; 
 }
}

And in general, avoid magic numbers. Instead of 26, declare a global constant, either with #define or const int. Sometimes you use 65 instead of 'A'. Don't do that.

Question 3

Thanks for the feedback! I definitely feel like I am over commenting at times but part of the assignments for CS50 is "style" and you get dinged for lack of comments so I often find myself just putting stuff in to make sure that doesn't happen, but I am aware that I am doing that so I guess that's a good thing? Could you explain the last part a little bit more? I'm just not fully understanding how if(arr[argv[1][i]] > 0) is checking for uniqueness

Question 4

@apreynolds1989 Well, what should I say? Sometimes I am also forced to write crappy code just to meet requirements. ;)

Question 5

The arr is an array that counts the occurrences of characters. If you skip the return statement, you could use it to make a histogram. See my edit.

Question 6

haha fair enough! What I am not seeing is how the arr is counting unique characters as opposed to just counting characters? If I am overlooking something plainly obvious, I blame the night shift :P

Question 7

@apreynolds1989 If arr[n] > 1 that means that n occurs more than once. That's the definition of not unique.

Question 8

validate:

Once we've checked the number of command line arguments, we can use a named variable for the key, instead of referring to it as v[1] everywhere: string key = v[1]. We should do the same in the main function.
We can save the string length to another variable, instead of calling strlen multiple times.

The alphabet check contains an unnecessary branch that does nothing. We can simplify it to:

 //Validate that all key characters are alphabetic
 for (int i = 0; i < key_len; i++)
 {
 if (!isalpha(key[i]))
 {
 printf("Key must contain only alphabetic characters\n");
 return 1; //Return error
 }
 }

The uniqueness check will compare the last non-null character in the key with the null character at the end. We need to limit x to the range [0, key_len - 1) to avoid that.
if (v[1][x] == v[1][y] + 32), if (v[1][x] == v[1][y] - 32). We should use a named constant variable for 32 to make it clearer what this is doing.

Note, however, that these checks don't do exactly what we want them to. They will also compare the letters with various punctuation characters.

We can replace these 3 ifs with a single one: if (toupper(key[x]) == toupper(key[y]))

main:

We can use isalpha in our encoding loop, instead of custom range checking.
We don't need to loop over the alphabet to do the encoding. We can calculate the offset of our index character from 'a' or 'A', and use that as the index into the key.
It looks like we're trying to maintain the case of the plaintext - which maybe isn't a sensible feature for a cryptographic system! Anyway... in that case, we need to use isupper on the plaintext[t], not on the encoded value (which is what the original program seems to be doing).

Note that since we don't know whether the key chars are uppercase or lowercase, we should always use toupper or tolower on the output.

So I think we can do something like:

for (int t = 0, len = strlen(plaintext); t < len; t++)
{
 if (!isalpha(plaintext[t]))
 {
 printf("%c", plaintext[t]);
 continue;
 }
 char encoded = key[toupper(plaintext[t]) - 'A'];
 char output = isupper(plaintext[t]) ? toupper(encoded) : tolower(encoded);
 printf("%c", output);
}

Some additional explanation:

As I understand it, the key string (argv[1]) contains the character to be substituted for each letter of the alphabet. So 'a' in the plaintext is replaced with key[0], 'b' with key[1], etc.

So to get the correct index in the key for a given character from 'A' to 'Z', we need to get the distance of that character from 'A'. Since char is an integer type, we can do this with some simple math: plaintext[t] - 'A'

Here 'A' is simply another way of writing 65. So if plaintext[t] is also 'A', we get 65 - 65, or if plaintext[t] is 'B' we get 66 - 65. This gives us the index we need for the key.

plaintext[t] could be uppercase or lowercase. For lowercase we could get the distance from 'a' instead, but it's easier to convert everything to uppercase and do a single comparison.

Question 9

Thanks for the feedback! Maintaining the case was indeed a part of the assignment and I did struggle with that. Just curious if you could explain a little more what this code is doing? Not fully wrapping my head around it (not sure if it's because i'm 11 hours into a night shift or not) char encoded = key[toupper(plaintext[t]) - 'A'];

Question 10

Sure. Added a bit more explanation.

Question 11

a more efficient way to do it?

for loop obliges 2 sequential trips through the string plaintext. Instead, consider one trip: iterate until the null character is found.

// for (int t = 0, len = strlen(plaintext); t < len; t++)
for (int t = 0); plaintext[t]; t++)

Is there something I overlooked?

char is signed or unsigned. When signed, it can have negatives values like -42. isupper(int ch) and other is...() functions are well defined for ch in the unsigned char range and EOF. Pedantic code insures the range.

char upper = argv[1][u];
// int up = isupper(upper);
int up = isupper((unsigned char) upper);

Is this good code (for a beginner)?

Yes.

Good use of local variables rather than all at the top of the function.

Use if/else

 int up = isupper(upper);
 //if (up == 0) {
 // upper = toupper(upper);
 // printf("%c", upper);
 //}
 //if (up != 0) {
 // printf("%c", upper);
 //}
 if (up == 0) {
 upper = toupper(upper);
 printf("%c", upper);
 } else {
 printf("%c", upper);
 }

or

 int up = isupper(upper);
 if (up == 0) {
 upper = toupper(upper);
 }
 printf("%c", upper);

or simply

 // int up = isupper(upper);
 upper = toupper(upper);
 printf("%c", upper);

klutt klutt 9035 silver badges16 bronze badges · Answer 1 · 2020-11-02 09:18:59Z

My first suggestion is actually to remove most of the comments. Sometimes comments are useful, but when they are explaining obvious things, they just clutter the code. Also, since validate checks the command line arguments, I would rename it to validate_args and also name the arguments argc and argv.

So assuming we have renamed those variables, let's continue on that function. The check for number of arguments and argument length are good as they are. So we jump to checking for only alpha characters. Your loop is too complicated. It does not need an if statement. Also, we could change the for loop a bit, since we already know that it has correct length.

for (int i = 0; argv[1][i] != '0円'; i++)
{
 if (!isalpha(v[1][i]))
 printf("Key must contain only alphabetic characters\n");
 return 1;
 }
}

The check for uniqueness could be made a lot simpler. I made the array bigger than 26 so that it works if you want to use more characters in future. Also changed the error message because of that.

int arr[256] = {0}; 
for (int i = 0; argv[1][i] != '0円'; i++)
{
 if(arr[argv[1][i]] > 0)
 {
 printf("Key must contain unique characters\n");
 return 1; 
 }
 arr[argv[1][i]]++;
}

You could make it shorter by removing the last line that increments the array and instead write the if statement like this: if(arr[argv[1][i]]++ > 0) but IMHO, that just makes it harder to read.

To clarify, what's happening here is basically something that creates a histogram. You could write it like this instead:

int arr[256] = {0}; 
for (int i = 0; argv[1][i] != '0円'; i++) {
 arr[argv[1][i]]++;
}
// Now arr[n] indicates how many of character n the argument had
for(int i=0; i<sizeof arr; i++) {
 if(arr[i] > 1) {
 printf("Key must contain unique characters\n");
 return 1; 
 }
}

And in general, avoid magic numbers. Instead of 26, declare a global constant, either with #define or const int. Sometimes you use 65 instead of 'A'. Don't do that.

Thanks for the feedback! I definitely feel like I am over commenting at times but part of the assignments for CS50 is "style" and you get dinged for lack of comments so I often find myself just putting stuff in to make sure that doesn't happen, but I am aware that I am doing that so I guess that's a good thing? Could you explain the last part a little bit more? I'm just not fully understanding how if(arr[argv[1][i]] > 0) is checking for uniqueness
@apreynolds1989 Well, what should I say? Sometimes I am also forced to write crappy code just to meet requirements. ;)
The arr is an array that counts the occurrences of characters. If you skip the return statement, you could use it to make a histogram. See my edit.
haha fair enough! What I am not seeing is how the arr is counting unique characters as opposed to just counting characters? If I am overlooking something plainly obvious, I blame the night shift :P
@apreynolds1989 If arr[n] > 1 that means that n occurs more than once. That's the definition of not unique.

user673679 user673679 12.2k2 gold badges34 silver badges65 bronze badges · Answer 2 · 2020-11-02 08:14:57Z

validate:

Once we've checked the number of command line arguments, we can use a named variable for the key, instead of referring to it as v[1] everywhere: string key = v[1]. We should do the same in the main function.
We can save the string length to another variable, instead of calling strlen multiple times.

The alphabet check contains an unnecessary branch that does nothing. We can simplify it to:

 //Validate that all key characters are alphabetic
 for (int i = 0; i < key_len; i++)
 {
 if (!isalpha(key[i]))
 {
 printf("Key must contain only alphabetic characters\n");
 return 1; //Return error
 }
 }

The uniqueness check will compare the last non-null character in the key with the null character at the end. We need to limit x to the range [0, key_len - 1) to avoid that.
if (v[1][x] == v[1][y] + 32), if (v[1][x] == v[1][y] - 32). We should use a named constant variable for 32 to make it clearer what this is doing.

Note, however, that these checks don't do exactly what we want them to. They will also compare the letters with various punctuation characters.

We can replace these 3 ifs with a single one: if (toupper(key[x]) == toupper(key[y]))

main:

We can use isalpha in our encoding loop, instead of custom range checking.
We don't need to loop over the alphabet to do the encoding. We can calculate the offset of our index character from 'a' or 'A', and use that as the index into the key.
It looks like we're trying to maintain the case of the plaintext - which maybe isn't a sensible feature for a cryptographic system! Anyway... in that case, we need to use isupper on the plaintext[t], not on the encoded value (which is what the original program seems to be doing).

Note that since we don't know whether the key chars are uppercase or lowercase, we should always use toupper or tolower on the output.

So I think we can do something like:

for (int t = 0, len = strlen(plaintext); t < len; t++)
{
 if (!isalpha(plaintext[t]))
 {
 printf("%c", plaintext[t]);
 continue;
 }
 char encoded = key[toupper(plaintext[t]) - 'A'];
 char output = isupper(plaintext[t]) ? toupper(encoded) : tolower(encoded);
 printf("%c", output);
}

Some additional explanation:

As I understand it, the key string (argv[1]) contains the character to be substituted for each letter of the alphabet. So 'a' in the plaintext is replaced with key[0], 'b' with key[1], etc.

So to get the correct index in the key for a given character from 'A' to 'Z', we need to get the distance of that character from 'A'. Since char is an integer type, we can do this with some simple math: plaintext[t] - 'A'

Here 'A' is simply another way of writing 65. So if plaintext[t] is also 'A', we get 65 - 65, or if plaintext[t] is 'B' we get 66 - 65. This gives us the index we need for the key.

plaintext[t] could be uppercase or lowercase. For lowercase we could get the distance from 'a' instead, but it's easier to convert everything to uppercase and do a single comparison.

Thanks for the feedback! Maintaining the case was indeed a part of the assignment and I did struggle with that. Just curious if you could explain a little more what this code is doing? Not fully wrapping my head around it (not sure if it's because i'm 11 hours into a night shift or not) char encoded = key[toupper(plaintext[t]) - 'A'];

chux chux 36.4k2 gold badges43 silver badges96 bronze badges · Answer 3 · 2020-11-03 00:23:45Z

a more efficient way to do it?

for loop obliges 2 sequential trips through the string plaintext. Instead, consider one trip: iterate until the null character is found.

// for (int t = 0, len = strlen(plaintext); t < len; t++)
for (int t = 0); plaintext[t]; t++)

Is there something I overlooked?

char is signed or unsigned. When signed, it can have negatives values like -42. isupper(int ch) and other is...() functions are well defined for ch in the unsigned char range and EOF. Pedantic code insures the range.

char upper = argv[1][u];
// int up = isupper(upper);
int up = isupper((unsigned char) upper);

Is this good code (for a beginner)?

Yes.

Good use of local variables rather than all at the top of the function.

Use if/else

 int up = isupper(upper);
 //if (up == 0) {
 // upper = toupper(upper);
 // printf("%c", upper);
 //}
 //if (up != 0) {
 // printf("%c", upper);
 //}
 if (up == 0) {
 upper = toupper(upper);
 printf("%c", upper);
 } else {
 printf("%c", upper);
 }

or

 int up = isupper(upper);
 if (up == 0) {
 upper = toupper(upper);
 }
 printf("%c", upper);

or simply

 // int up = isupper(upper);
 upper = toupper(upper);
 printf("%c", upper);

Stack Exchange Network

Substitution Cipher in C

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Substitution Cipher in C

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions