So I'm starting out and doing some code challenges in between book and tutorial learning and was hoping to get some general feedback on code readability and use of comments.
I know it's sometimes hard to understand your own code after you wrote it so my question is - how easy is it to understand this one based on comments?
Do you think I'm missing anything crucial? Any general feedback welcome, thanks.
/*
https://edabit.com/challenge/SEzYX5qtTR4WcZeR3
An isogram is a word that has no repeating letters, consecutive or
nonconsecutive.Create a function that takes a string and returns
either true or false depending on whether or not it's an "isogram".
Examples
isIsogram("Algorism") ➞ true
isIsogram("PasSword") ➞ false
// Not case sensitive.
isIsogram("Consecutive") ➞ false
*/
#include <iostream>
#include <ctype.h> // to use the toupper function which takes a letter
character and returns the upper case ASCII integer
using namespace std;
int main()
{
string input_string;
string output_string;
bool isogram_check = true; // by default the program assumes that everything is an isogram and then tries to find repeating letters to assert that it's not an isogram
cout << "Enter a string." << endl;
cin >> input_string;
for (int i = 0; i < input_string.length(); i++) // this loop takes the input string and converts it to all upper case letters.
{
output_string = output_string + char(toupper(input_string[i]));
}
// cout << output_string << endl;
// the above line is to test the toupper output
for (int i = 0; i < output_string.length(); i++) // the first loop decides WHAT to compare to something else, the second loops decides what that SOMETHING ELSE is.
{
for (int o = i+1; o < output_string.length(); o++) // this loop takes the character at position i and compares it to the character at i+1 then i+2, etc. by iterating o.
{ // in the above comment, i+1 = o since we iterated i once right at the beginning.
if (output_string[i] == output_string[o]) // when the loop finds two of the same character within the string it brakes out of both loops
{
isogram_check = false;
break;
}
}
if (isogram_check == false)
{
break;
}
}
if (isogram_check == false)
{
cout << input_string << " is not an isogram.\n";
}
else
{
cout << input_string << " is an isogram.\n";
}
}
6 Answers 6
Avoid using namespace std
.
Keep your lines shorter. Having to constantly scroll back and forth to read lines is unpleasant. Most of your comments should be on one (or more) lines right before the statement being commented on. Also, avoid obvious things in comments (like "resets i to use in the next loop.").
Declare variables as close as possible to their first use.
Since isogram_check
is a bool
, you can test it with !isogram_check
instead of isogram_check == false
.
With a slight change in algorithm (searching characters before the ith one), your two while
loops can be combined into one. Although there are other ways to do this that could be better (for some definition of "better"), your implementation is a quick first pass that works well for short strings (but less well as the strings get longer).
-
1\$\begingroup\$ Thanks. I am intentionally using the namespace declaration since I know I won't be using other namespaces. Should I avoid it even knowing the issue that can come up? What would I use to combine the 2 loops? My idea is that since I need to iterate both the WHAT I'm comparing and the what I'm comparing TO that I need 2 loops. \$\endgroup\$Chiril Russu– Chiril Russu2020年01月05日 00:03:22 +00:00Commented Jan 5, 2020 at 0:03
-
\$\begingroup\$ @ChirilRussu you never know which namespace you need in the future, so it's best to avoid that. One of my employers' projects uses
namespace std
everywhere and causes a lot of name clashes problems that we have to do some workaround to avoid to much code change \$\endgroup\$phuclv– phuclv2020年01月05日 11:37:23 +00:00Commented Jan 5, 2020 at 11:37 -
\$\begingroup\$ @ChirilRussu - Quoting from the zen of python, "Namespaces are one honking great idea — let's do more of those!" The same concept very much applies to C++. The key intent of namespaces is to avoid collisions. It's best to avoid "using namespace <namespace_name>", ever. \$\endgroup\$David Hammen– David Hammen2020年01月05日 15:35:38 +00:00Commented Jan 5, 2020 at 15:35
I am not an expert in c++ but your code seems verbose. You can try to make it more explicit for a person who knows better c++ better. Plus the algorithm is not that hard to digest so writing some built-in constructions will not overcome the ability to understand the whole flow.
Unless your task is not bound by memory you can use std::set
to do the counting of unique characters for you and squeeze out some lines of code.
#include <iostream>
#include <algorithm>
#include <unordered_set>
#include <ctype.h>
int main()
{
std::string input_string;
std::cout << "Enter a string." << std::endl;
std::cin >> input_string;
std::unordered_set<char> chars;
std::cout << input_string << (std::all_of(
input_string.begin(),
input_string.end(),
[&chars](const char c) {
auto upper = toupper(static_cast<unsigned char>(c));
// ^ ^ ^
// avoid undefined behaviour
return chars.insert(upper).second;
}
) ?
" is an isogram.\n" :
" is not an isogram.\n");
}
I see you're probably trying to practice for coding interviews or just exercising your problem-solving skills, either way, you should think about using the stl of the given language.
What I mean is not using sorting functions when you are asked to do sorting challenge, but rather use some built-in algorithms as an intermediary step before achieving what your goal is.
-
\$\begingroup\$ Thanks, I'm just learning that's why I don't know the best ways to do things. For example I don't know how
transform
works, never used.begin
or.end.
Don't know howset
works, etc. Now that you wrote them for me I can just look it up and figure it out, but it's sort of I don't know what I don't know type of situation while I'm still learning. \$\endgroup\$Chiril Russu– Chiril Russu2020年01月05日 00:19:00 +00:00Commented Jan 5, 2020 at 0:19 -
\$\begingroup\$ @ChirilRussu, well
set
is a data structure that only keeps unique elements, so it maintains uniqueness for you. So now you probably see that ifset.size()
is equal toinput_string.size()
this means that all chars frominput_string
got into theset
=> they are all unique \$\endgroup\$kuskmen– kuskmen2020年01月05日 00:26:01 +00:00Commented Jan 5, 2020 at 0:26 -
\$\begingroup\$ @ChirilRussu
.begin
and.end
are methods that returniterator
to a certain point of the caller - in this case theset
. You can think of them as start and end indexes until you feel comfortable to dig down into en.wikipedia.org/wiki/Iterator_pattern \$\endgroup\$kuskmen– kuskmen2020年01月05日 00:28:06 +00:00Commented Jan 5, 2020 at 0:28 -
\$\begingroup\$ @ChirilRussu,
transform
is just a function that takes start and end iterator (you can think of index) and applies some function to each element that this iterator is traversing in this case its thestring
\$\endgroup\$kuskmen– kuskmen2020年01月05日 00:30:07 +00:00Commented Jan 5, 2020 at 0:30
You have two while
loops that have a counter i
that is incremented. You should really use a for
loop.
Your algorithm has quadratic complexity. Can you come up with something that is linear? To be fair, I can immediately only think of a linear solution that has a way higher proportionality constant, so it will be faster only for very long strings. Still.... worth exploring.
-
\$\begingroup\$ Thanks, I changed both while loops to for loops and got rid of the global i. \$\endgroup\$Chiril Russu– Chiril Russu2020年01月05日 00:05:24 +00:00Commented Jan 5, 2020 at 0:05
-
\$\begingroup\$ so
!isogram_check
works, just to clarify my understanding that is the same asisogram_check != initialized value
? \$\endgroup\$Chiril Russu– Chiril Russu2020年01月05日 00:12:31 +00:00Commented Jan 5, 2020 at 0:12 -
\$\begingroup\$ right,
something==false
is the same as!something
\$\endgroup\$Victor Eijkhout– Victor Eijkhout2020年01月05日 00:15:02 +00:00Commented Jan 5, 2020 at 0:15
Many good comments have been made already, eg:
- code in a function
- limit source code witdth to to 80-100 columns
- code is overly complex for the task, hard to digest it
Try to use the available algorithms and data structures. Those are reference links, but many good articles and books have been written.
One good way to solve many problems related to "frequency of occurrence" is to produce a "count of unique occurrence" or "frequency". Arguably the canonical tool for this is std:unordered_map
, which uses a hashtable
internally. Often it is combined with std::partial_sort_copy
to find the "10 most frequent" or similar. We don't need sort_copy
here because we can return false on first repeat. There is an suggested code for the is_isogram()
function using std::unordered_map
with some static calls from main()
below.
But the problem here is even simpler, because we don't actually need to know how many repeats there are I have provide a second function is_isogram_set()
which uses an std::unordered_set
. I ran some quick benchmarks and the std::unordered_set
is about 5-10% faster than the std::unordered_map
.
If we can assume "ASCII only" text, and ignore nonalpha completely, then all uppercase'd characters will be between 'A' - 'Z' and we could sensibly use something like a std::array<bool>
for which I have provided a third sample is_isogram_array()
. This is signignificantly (~10-15x !) faster than the other 2 options.
Any of these should be easily understood by anyone with C++ experience, rather than unpicking "several custom raw loops". Using the standard containers allows us to focus on clean efficient code, rather than "loop correctness". So we were easily able to find significant speed/feature trade-offs.
EDIT: Based on a suggestion from @LaurentLARIZZA I added an alternative version of the std::array
variant which uses the STL algorithm std::all_of
. I named it is_isogram_array_algo()
. I doesn't fundamentally change anything, but it does allow us to avoid the ranged for loop and to make the code slightly more self documenting. ie we can name the lambda "unseen" and write "all of unseen", which arguably makes it slightly clearer what is going on. Code is slightly longer, but has identical performance.
#include <iostream>
#include <string>
#include <cctype>
#include <unordered_map>
#include <unordered_set>
#include <array>
bool is_isogram(const std::string& str) {
auto freqs = std::unordered_map<char, int>{};
for (auto&& c: str) {
if (auto uc = static_cast<unsigned char>(c); std::isalpha(uc)) {
int& count = freqs[std::toupper(static_cast<unsigned char>(uc))];
if (count > 0) return false;
count++;
}
}
return true;
}
bool is_isogram_set(const std::string& str) {
auto freqs = std::unordered_set<char>{};
for (auto&& c: str) {
if (auto uc = static_cast<unsigned char>(c); std::isalpha(uc)) {
auto [iter, was_new] = freqs.insert(std::toupper(uc));
if (!was_new) return false;
}
}
return true;
}
bool is_isogram_array(const std::string& str) {
auto freqs = std::array<bool, 26>{false};
for (auto&& c: str) {
if (auto uc = static_cast<unsigned char>(c); std::isalpha(uc)) {
int idx = std::toupper(uc) - 'A';
if (freqs[idx]) return false;
freqs[idx] = true;
}
}
return true;
}
bool is_isogram_array_algo(const std::string& str) {
auto freqs = std::array<bool, 26>{false};
auto unseen = [&freqs](char c) {
if (auto uc = static_cast<unsigned char>(c); std::isalpha(uc)) {
int idx = std::toupper(uc) - 'A';
if (freqs[idx]) return false;
freqs[idx] = true;
}
return true;
};
return std::all_of(str.begin(), str.end(), unseen);
}
int main() {
std::cout << std::boolalpha
<< "Algorism=" << is_isogram("Algorism") << '\n'
<< "PasSword=" << is_isogram("PasSword") << '\n'
<< "Consecutive=" << is_isogram("Consecutive") << '\n';
return 0;
}
-
-
1\$\begingroup\$ I don't disagree. In fact i had it that way and changed last minute before posting. Trying to make
for (auto&& e: ..)
a habit because "it always works": quuxplusone.github.io/blog/2018/12/15/autorefref-always-works \$\endgroup\$Oliver Schönrock– Oliver Schönrock2020年01月05日 16:17:50 +00:00Commented Jan 5, 2020 at 16:17 -
\$\begingroup\$ Thanks for the link, interesting point as well! \$\endgroup\$Juho– Juho2020年01月05日 16:26:59 +00:00Commented Jan 5, 2020 at 16:26
-
\$\begingroup\$ Just did a test for a simplified (to get rid of map and iostream boilerplate code) example on godbolt: godbolt.org/z/2R3YdU I copy pasted the ASM for auto and for auto&& and the diff is precisely
nil
. I am not saying it will always be, in fact this example is "too simple" as it gets SIMD vector optimised etc etc. But in general could we assume that auto&& will not gen "worse" code? I agree it looks "fishy" if you're not used to it though... \$\endgroup\$Oliver Schönrock– Oliver Schönrock2020年01月05日 16:34:44 +00:00Commented Jan 5, 2020 at 16:34 -
\$\begingroup\$ Would an
unordered_multiset
also be a fitting data structure to represent the concept of a "histogram"? I don't know C++, but that's what I would use in a different language (codereview.stackexchange.com/a/233584/1581), even going so far as to implement my own (stackoverflow.com/a/57644639/2988). \$\endgroup\$Jörg W Mittag– Jörg W Mittag2020年01月05日 16:42:03 +00:00Commented Jan 5, 2020 at 16:42
Building on Oliver Schonrock's answer, here is an equivalent version of the code leveraging the std::all_of
algorithm. Note that I did not try to change the semantics of given in Oliver's answer, as all three structures could be made much shorter by always counting, and using bitwise operations.
#include <iostream>
#include <string>
#include <cctype>
#include <unordered_map>
#include <unordered_set>
#include <array>
struct IsFirstOccurrence_UM {
bool operator()(char c) {
int& count = freqs[std::toupper(static_cast<unsigned char>(uc))];
if (count > 0) {
return false;
} else {
count++;
return true;
}
}
private:
std::unordered_map<char, int> freqs;
};
struct IsFirstOccurrence_Set {
bool operator()(char c) {
if (auto uc = static_cast<unsigned char>(c); std::isalpha(uc)) {
auto [iter, was_new] = freqs.insert(std::toupper(uc));
return was_new;
} else {
return true; // Note that non-alpha characters are always considered first-occurrence
}
}
private:
std::unordered_set<char> freqs;
};
struct IsFirstOccurrence_Array {
bool operator()(char c) {
if (auto uc = static_cast<unsigned char>(c); std::isalpha(uc)) {
int idx = std::toupper(uc) - 'A';
if (freqs[idx]) {
return false;
} else {
freqs[idx] = true;
return true;
}
} else {
return true; // Note that non-alpha characters are always considered first-occurrence
}
}
private:
std::array<bool, 26> freqs{false};
};
template<typename IsFirstOccurrencePedicate>
bool is_isogram(const std::string& str, IsFirstOccurrencePedicate is_first_occurrence) {
return std::all_of(str.begin(), str.end(), std::move(is_first_occurrence));
}
int main() {
std::cout << std::boolalpha
<< "Algorism=" << is_isogram("Algorism", IsFirstOccurrence_UM()) << '\n'
<< "PasSword=" << is_isogram("PasSword", IsFirstOccurrence_UM()) << '\n'
<< "Consecutive=" << is_isogram("Consecutive", IsFirstOccurrence_UM()) << '\n';
return 0;
}
-
\$\begingroup\$ Yeah, nice. Demonstrates something slightly different. I also added a version above using
std::all_of
using a lambda, so without the templated reuse. I viewed the 3 options as mutually exclusive alternatives, and you would keep only one. I think we are beginning to overthink this trivial problem ;-) \$\endgroup\$Oliver Schönrock– Oliver Schönrock2020年01月06日 13:56:37 +00:00Commented Jan 6, 2020 at 13:56 -
\$\begingroup\$ Yes we are. That's why I avoided fleshing out the
IsFirstOccurrencePredicate
concept :) \$\endgroup\$Laurent LA RIZZA– Laurent LA RIZZA2020年01月06日 14:01:51 +00:00Commented Jan 6, 2020 at 14:01
As an alternative to the "efficiency" focused options in my other answer, here is an option which is just very terse. Given we can't have words> 26 characters which are isograms, this might be just enough?
#include <algorithm>
#include <unordered_set>
#include <cctype>
bool is_isogram(std::string s) {
std::transform(s.begin(), s.end(), s.begin(), [](unsigned char c) { return std::toupper(c); });
return std::unordered_set<char>(s.begin(), s.end()).size() == s.size();
}
It's closer (!) to what one might write in a language like python.
def is_isogram(s):
return len(s) == len(set(s.upper()))