I have tried to implement a code that does the following:
- store words in an array (unordered)
- use function
order_array()
to place repetitions for the same word near (so the result, given a sample text file, could be:mom mom mom mom lamb lamb eye eye pain tears
) - use function
count_frequency
to store every word one single time + its frequency (I store them in a linked list) - print the linked list to obtain given result
The code works, but it appears (to my eye) cumbersone and unefficient. How can I improve it?
Note that I am not familiar with concepts like vectors, hash maps or dictionaries yet, but I have a good grasp of basic data structures (eg linked lists, stacks, queues, binary search trees). Still, any hint, help or advice would be greatly appreciated
using namespace std;
#include<iostream>
#include<fstream>
#include<string>
const int DIM = 50;
int count = 0;
string list[DIM];
string ordered_list[DIM];
struct node
{
string word;
int frequency;
node* next;
};
node* head = NULL;
node* remove_node(node* &h)
{
node* s = head;
head = s->next;
return s; // and then remove everything
}
void add_node(string &w, int &f, node* &h)
{
node* t = new node;
t->word = w;
t-> frequency = f;
t->next = h;
h = t;
}
void print_highestf(node* &h)
{
node* s = h;
int higherf = 0;
while(s!=NULL)
{
if((s->frequency) >= higherf)
{
higherf = s->frequency;
}
s = s->next;
}
cout << "The highest frequency for a word is: " << higherf << endl;
}
void print_res(node* &h)
{
for(node* s = head; s != NULL; s = s->next)
{
cout << "Frequency for word " << s->word << ": " << s->frequency << endl;
cout << "\n";
}
}
void upload_array(string &w, string a[])
{
a[count] = w;
count ++;
}
bool is_new_string(string &word, int position) //check better
{
bool repeated = false;
for(int i = 0; i < position; i++)
{
if(word == list[i] && repeated == true)
{
return false;
}
else if(word == list[i] && repeated == false)
{
repeated = true;
}
}
return true;
}
void order_array(string a[], string oa[])
{
int i = 0;
int times = 0;
while(i!= count)
{
int j = i;
while(j != count)
{
if((a[i] == a[j]) && is_new_string((a[i]), i+1))
{
//new node
oa[times]=a[i];
times++;
}
j++;
}
i++;
}
}
void count_frequency(string oa[])
{
int frequency = 1;
for(int i = 0; i<count-1; i++)
{
if(oa[i] != oa[i+1])
{
add_node(oa[i], frequency, head);
frequency = 1;
}
else if(oa[i] == oa[i+1])
{
frequency++;
}
}
}
int main()
{
fstream mystream;
mystream.open("words.txt", ios::in);
string word;
//upload array of strings
while(!mystream.eof())
{
mystream >> word;
upload_array(word, list);
}
order_array(list, ordered_list);
count_frequency(ordered_list);
print_res(head);
print_highestf(head);
for(node* s= head; s!=NULL; s = s->next)
{
node* r = remove_node(head);
delete r;
}
cout << "\nDeinitialization executed. " << endl;
mystream.close();
return 0;
}
2 Answers 2
Kudos on writing a working program! You already sense that things can be improved, which is also great. Toby Speight already pointed out some issues in your code. You are a beginning C++ programmer, so it's normal that you don't write optimal code yet; you have just learned the basic tools. To show you how a more experienced C++ programmer would tackle the problem of counting frequencies, let me show you this implementation:
#include <cstdlib>
#include <algorithm>
#include <format>
#include <fstream>
#include <iostream>
#include <ranges>
#include <unordered_map>
#include <vector>
// A struct to hold a value and its frequency
template<typename T>
struct counted_value {
std::size_t count;
T value;
// Add a default comparison operator so we can sort on these structs
friend auto operator<=>(const counted_value&, const counted_value&) = default;
};
// A templated function that can take any kind of input
template<std::ranges::input_range Range>
auto count_frequencies(Range&& range) {
// Deduce the type of values stored in the range
using value_type = decltype(std::ranges::begin(range))::value_type;
// Count frequencies in a hash map
std::unordered_map<value_type, std::size_t> counters;
for (const auto& value: range) {
counters[value]++;
}
// Copy the result into a vector
std::vector<counted_value<value_type>> result;
result.reserve(counters.size());
for (const auto& [value, count]: counters) {
result.emplace_back(count, value);
}
// Sort the vector on the frequency count, largest count first
std::ranges::sort(result, std::greater{});
return result;
}
int main() {
std::ifstream input("words.txt");
// Create a view of the words in the input stream
auto words = std::ranges::istream_view<std::string>(input);
auto frequencies = count_frequencies(words);
// We should have reached the end, if we didn't something went wrong
if (!input.eof()) {
std::cerr << "Error reading input\n";
return EXIT_FAILURE;
}
for (const auto& item: frequencies) {
std::cout << std::format("Frequency for word {}: {}\n",
item.value, item.count);
}
}
I've used a lot of features from the standard library and from the language itself, in particular features from C++20. I'll list them here with links to cppreference.com:
template
so you can create a frequency counting function that can count any kind of input, not just an array of strings.- A default comparison operator to make a
struct
comparable, which helps it sort later. - The concept
std::ranges::input_range
to restrict the input ofcount_frequencies()
to ranges that can be iterated over. It's not really necessary here, but it results in nicer error messages if you call the function with the wrong type of input. using
to declare a type alias, this is just to save typing.decltype()
to deduce the type of values stored in the input range.std::unordered_map
is basically a dictionary, we use it to count how often a given value appears in the input.- Range-based
for
loops, this is often simpler and safer than the oldfor
loops where you manually iterate. std::vector
is basically a dynamically sized array. It takes care of memory allocation for you. We use it to store the final result.std::ranges::sort()
provides an easy way to sort containers. Normally it sorts so the smallest value comes first, but we pass itstd::greater
so we sort it largest value first.- The
auto
type lets the compiler deduce the type for you. This saves typing and reduces the chance of making mistakes. std::ranges::istream_view()
is a helper that allows you to convert an input stream (like the file you open) into a view that can be iterated over using range-basedfor
loops and with other algorithms.std::format()
makes it much easier to format strings. C++23 also introducedstd::print()
which makes it even easier to print things, but not all compilers support it yet.
Don't worry about all of this, just continue your programming journey, and eventually you learn most or all of the above.
-
\$\begingroup\$ Thank you so much for your detailed answer. There is still a lot to go but your precise explanation helped me seeing what I need to learn in order to improve! \$\endgroup\$Arsenio– Arsenio2024年02月04日 13:00:45 +00:00Commented Feb 4, 2024 at 13:00
Please don't do this:
using namespace std;
That immediately takes away the benefit of namespaces, making it harder to be sure which namespace identifiers come from. It's particularly harmful when you declare names of your own (e.g. list
and next
) that are the same as those in std
.
This is an anti-pattern:
while(!mystream.eof()) { mystream >> word; upload_array(word, list); }
The EOF flag is set on a stream when input has failed due to reaching the end. It does not guarantee that the next read will succeed. What we want is
while(mystream >> word)
{
upload_array(word, list);
}
Whilst implementing your own linked list may be an instructive experience, I think it's more useful to familiarise yourself with the standard library containers. As a C++ programmer, you'll be using these on a daily basis, whereas you may never need to implement one of your own.
This program would be much, much simpler if we used one of the standard map
classes (std::map<std::string, std::size_t>
if we want the results to be in a repeatable order, or the std::unordered_map
equivalent if not). And we wouldn't have any raw pointers, so no need to verify that every new
is matched with a delete
.
The program would be more useful if it read from std::cin
rather than requiring a specifically-named file for its input. That allows the user to redirect from any file, or from a pipe, or even interactively.
The end of the program has some unnecessary code:
cout << "\nDeinitialization executed. " << endl; mystream.close(); return 0;
There's no need to flush the output stream, given that happens automatically at program termination, so we can remove the std::endl
and simply add \n
to the end of the string literal.
The destructor of mystream
will automatically close the file, so that's unnecessary unless we're actually going to take any notice of the result.
Executing off the end of main()
(but no other function) is defined as returning 0 (success), so the return
can be omitted.
-
\$\begingroup\$ Thank you very much for your answer, I have learnt something new and useful. I will treasure what you have said ! \$\endgroup\$Arsenio– Arsenio2024年02月04日 12:54:08 +00:00Commented Feb 4, 2024 at 12:54