I was directed here from Stack Overflow.
I completed my recent programming assignment for developing a substitution cipher in C. Below is what I came up with after reading many tutorials, googling many questions, watching many videos, etc. but I am happy with the result. I was just wondering if you know of a more efficient way to do it? Is this good code (for a beginner)? Is there something I overlooked? It should be noted that this is CS50 and I was using the CS50 library. If this isn't the place for this type of feedback, please delete! Thanks!
Here is my code:
#include <cs50.h>
#include <ctype.h>
#include <stdio.h>
#include <string.h>
int validate(int c, string v[]); //prototpe validate function
int main(int argc, string argv[])
{
if (validate(argc, argv) == 0) //run validate for argc and argv and check its response
{
printf("Key is valid\n"); //if valid, print message
}
else
{
return 1; //Return error
}
string plaintext = get_string("plaintext: "); //Get user input
printf("ciphertext: "); //Output the cipher, using following algorithm
//Loop through each letter of inputed text
for (int t = 0, len = strlen(plaintext); t < len; t++)
{
//If not alphabetic character, print as entered
if (plaintext[t] < 'A' || (plaintext[t] > 'Z' && plaintext[t] < 'a') || plaintext[t] > 'z')
{
printf("%c", plaintext[t]);
}
//If alphabetic, encipher the input
else
{
for (int u = 0; u < 26; u++)
{
if (plaintext[t] == 65 + u) //check for uppercase alphabetic characters starting with ASCII 65
{
//Ensure outputed text maintains uppercase
char upper = argv[1][u];
int up = isupper(upper);
if (up == 0)
{
upper = toupper(upper);
printf("%c", upper);
}
if (up != 0)
{
printf("%c", upper);
}
}
if (plaintext[t] == 97 + u) //check for lowercase alphabetic characters starting with ASCII 97
{
//Ensure the outputed text maintains lowercase
char lower = argv[1][u];
int low = islower(lower);
if (low == 0)
{
lower = tolower(lower);
printf("%c", lower);
}
if (low != 0)
{
printf("%c", lower);
}
}
}
}
}
printf("\n"); //print newline for clean terminal
return 0; //Exit
}
//Key Validation function
int validate(int c, string v[])
{
//Validate that only one Command Line Argument was entered
if (c != 2) //Check the number if entered commands at execution
{
//If more than one, print error message
printf("Key must be the only Command Line Argument\n");
return 1; //Return error
}
//Validate that Key length is 26 characters
if (strlen(v[1]) != 26) //Don't forget null 0
{
printf("Key must contain exactly 26 characters\n");
return 1; //Return error
}
//Validate that all Key characters are alphabetic
for (int i = 0, n = strlen(v[1]); i < n; i++)
{
//Check each element of the array
if (isalpha(v[1][i]))
{
continue; //Continue if alphabetic
}
else
{
//if non-alphabetic, print error code
printf("Key must contain only alphabetic characters\n");
return 1; //Return error
}
}
//Validate that each Key character is unique
for (int x = 0, z = strlen(v[1]); x < z; x++)
{
//Create a second loop to compare every element to the first before incrementing the first
for (int y = x + 1; y < z; y++)
{
//Cast array elements to int, check if each element equals next element in array
if (v[1][x] == v[1][y])
{
printf("Key must contain exactly 26 unique characters\n");
return 1; //Return error
}
//also check if same character has been used in different case
if (v[1][x] == v[1][y] + 32)
{
printf("Key must contain exactly 26 unique characters\n");
return 1; //Return error
}
if (v[1][x] == v[1][y] - 32)
{
printf("Key must contain exactly 26 unique characters\n");
return 1; //Return error
}
}
}
return 0; //Key is valid, so return true
}
3 Answers 3
My first suggestion is actually to remove most of the comments. Sometimes comments are useful, but when they are explaining obvious things, they just clutter the code. Also, since validate
checks the command line arguments, I would rename it to validate_args
and also name the arguments argc
and argv
.
So assuming we have renamed those variables, let's continue on that function. The check for number of arguments and argument length are good as they are. So we jump to checking for only alpha characters. Your loop is too complicated. It does not need an if statement. Also, we could change the for loop a bit, since we already know that it has correct length.
for (int i = 0; argv[1][i] != '0円'; i++)
{
if (!isalpha(v[1][i]))
printf("Key must contain only alphabetic characters\n");
return 1;
}
}
The check for uniqueness could be made a lot simpler. I made the array bigger than 26 so that it works if you want to use more characters in future. Also changed the error message because of that.
int arr[256] = {0};
for (int i = 0; argv[1][i] != '0円'; i++)
{
if(arr[argv[1][i]] > 0)
{
printf("Key must contain unique characters\n");
return 1;
}
arr[argv[1][i]]++;
}
You could make it shorter by removing the last line that increments the array and instead write the if statement like this: if(arr[argv[1][i]]++ > 0)
but IMHO, that just makes it harder to read.
To clarify, what's happening here is basically something that creates a histogram. You could write it like this instead:
int arr[256] = {0};
for (int i = 0; argv[1][i] != '0円'; i++) {
arr[argv[1][i]]++;
}
// Now arr[n] indicates how many of character n the argument had
for(int i=0; i<sizeof arr; i++) {
if(arr[i] > 1) {
printf("Key must contain unique characters\n");
return 1;
}
}
And in general, avoid magic numbers. Instead of 26, declare a global constant, either with #define
or const int
. Sometimes you use 65
instead of 'A'
. Don't do that.
-
\$\begingroup\$ Thanks for the feedback! I definitely feel like I am over commenting at times but part of the assignments for CS50 is "style" and you get dinged for lack of comments so I often find myself just putting stuff in to make sure that doesn't happen, but I am aware that I am doing that so I guess that's a good thing? Could you explain the last part a little bit more? I'm just not fully understanding how if(arr[argv[1][i]] > 0) is checking for uniqueness \$\endgroup\$apreynolds1989– apreynolds19892020年11月02日 09:35:25 +00:00Commented Nov 2, 2020 at 9:35
-
\$\begingroup\$ @apreynolds1989 Well, what should I say? Sometimes I am also forced to write crappy code just to meet requirements. ;) \$\endgroup\$klutt– klutt2020年11月02日 09:38:25 +00:00Commented Nov 2, 2020 at 9:38
-
\$\begingroup\$ The
arr
is an array that counts the occurrences of characters. If you skip the return statement, you could use it to make a histogram. See my edit. \$\endgroup\$klutt– klutt2020年11月02日 09:40:03 +00:00Commented Nov 2, 2020 at 9:40 -
\$\begingroup\$ haha fair enough! What I am not seeing is how the arr is counting unique characters as opposed to just counting characters? If I am overlooking something plainly obvious, I blame the night shift :P \$\endgroup\$apreynolds1989– apreynolds19892020年11月02日 09:52:08 +00:00Commented Nov 2, 2020 at 9:52
-
\$\begingroup\$ @apreynolds1989 If
arr[n] > 1
that means thatn
occurs more than once. That's the definition of not unique. \$\endgroup\$klutt– klutt2020年11月02日 09:55:46 +00:00Commented Nov 2, 2020 at 9:55
validate
:
Once we've checked the number of command line arguments, we can use a named variable for the key, instead of referring to it as
v[1]
everywhere:string key = v[1]
. We should do the same in themain
function.We can save the string length to another variable, instead of calling
strlen
multiple times.The alphabet check contains an unnecessary branch that does nothing. We can simplify it to:
//Validate that all key characters are alphabetic for (int i = 0; i < key_len; i++) { if (!isalpha(key[i])) { printf("Key must contain only alphabetic characters\n"); return 1; //Return error } }
The uniqueness check will compare the last non-null character in the key with the null character at the end. We need to limit
x
to the range[0, key_len - 1)
to avoid that.if (v[1][x] == v[1][y] + 32)
,if (v[1][x] == v[1][y] - 32)
. We should use a named constant variable for32
to make it clearer what this is doing.Note, however, that these checks don't do exactly what we want them to. They will also compare the letters with various punctuation characters.
We can replace these 3
if
s with a single one:if (toupper(key[x]) == toupper(key[y]))
main
:
We can use
isalpha
in our encoding loop, instead of custom range checking.We don't need to loop over the alphabet to do the encoding. We can calculate the offset of our index character from
'a'
or'A'
, and use that as the index into the key.It looks like we're trying to maintain the case of the plaintext - which maybe isn't a sensible feature for a cryptographic system! Anyway... in that case, we need to use
isupper
on theplaintext[t]
, not on the encoded value (which is what the original program seems to be doing).Note that since we don't know whether the key chars are uppercase or lowercase, we should always use
toupper
ortolower
on the output.
So I think we can do something like:
for (int t = 0, len = strlen(plaintext); t < len; t++)
{
if (!isalpha(plaintext[t]))
{
printf("%c", plaintext[t]);
continue;
}
char encoded = key[toupper(plaintext[t]) - 'A'];
char output = isupper(plaintext[t]) ? toupper(encoded) : tolower(encoded);
printf("%c", output);
}
Some additional explanation:
As I understand it, the key
string (argv[1]
) contains the character to be substituted for each letter of the alphabet. So 'a'
in the plaintext is replaced with key[0]
, 'b'
with key[1]
, etc.
So to get the correct index in the key
for a given character from 'A'
to 'Z'
, we need to get the distance of that character from 'A'
. Since char
is an integer type, we can do this with some simple math: plaintext[t] - 'A'
Here 'A'
is simply another way of writing 65
. So if plaintext[t]
is also 'A'
, we get 65 - 65
, or if plaintext[t]
is 'B'
we get 66 - 65
. This gives us the index we need for the key.
plaintext[t]
could be uppercase or lowercase. For lowercase we could get the distance from 'a'
instead, but it's easier to convert everything to uppercase and do a single comparison.
-
\$\begingroup\$ Thanks for the feedback! Maintaining the case was indeed a part of the assignment and I did struggle with that. Just curious if you could explain a little more what this code is doing? Not fully wrapping my head around it (not sure if it's because i'm 11 hours into a night shift or not) char encoded = key[toupper(plaintext[t]) - 'A']; \$\endgroup\$apreynolds1989– apreynolds19892020年11月02日 09:31:59 +00:00Commented Nov 2, 2020 at 9:31
-
\$\begingroup\$ Sure. Added a bit more explanation. \$\endgroup\$user673679– user6736792020年11月02日 10:30:53 +00:00Commented Nov 2, 2020 at 10:30
a more efficient way to do it?
for
loop obliges 2 sequential trips through the string plaintext
. Instead, consider one trip: iterate until the null character is found.
// for (int t = 0, len = strlen(plaintext); t < len; t++)
for (int t = 0); plaintext[t]; t++)
Is there something I overlooked?
char
is signed or unsigned. When signed, it can have negatives values like -42. isupper(int ch)
and other is...()
functions are well defined for ch
in the unsigned char
range and EOF
. Pedantic code insures the range.
char upper = argv[1][u];
// int up = isupper(upper);
int up = isupper((unsigned char) upper);
Is this good code (for a beginner)?
Yes.
Good use of local variables rather than all at the top of the function.
Use if/else
int up = isupper(upper);
//if (up == 0) {
// upper = toupper(upper);
// printf("%c", upper);
//}
//if (up != 0) {
// printf("%c", upper);
//}
if (up == 0) {
upper = toupper(upper);
printf("%c", upper);
} else {
printf("%c", upper);
}
or
int up = isupper(upper);
if (up == 0) {
upper = toupper(upper);
}
printf("%c", upper);
or simply
// int up = isupper(upper);
upper = toupper(upper);
printf("%c", upper);