I have written C++ code using trie and min-heap to find the k most frequent words. I want to review my code.
#include<bits/stdc++.h>
using namespace std ;
//structure of the node of trie
struct TrieNode
{
bool IsEnd ; // If this node is end node
int HeapIndex ; // To store the index of this word in heap.
TrieNode * children [ 26 ] ; // To store its children.
int frequency ; // To store the frequency of word ending at this
TrieNode () // Constructure .
{
HeapIndex = -1 ;
IsEnd = false ;
frequency = 0 ;
for ( int i = 0 ; i < 26 ; i ++ )
children [ i ]= NULL ;
}
};
struct HeapNode
{
string word ; // To store the word.
int frequency ; // to store the frequency of word.
TrieNode* TN ; // To store the last Triendnode of this word.
HeapNode ()
{
frequency = 0 ;
TN = NULL ;
word = "" ;
}
};
class Trie {
TrieNode * root ;
public:
Trie()
{
root = new TrieNode() ;
}
TrieNode* insert( string word)
{
TrieNode * temp = root ;
for ( int i = 0 ; i < word.length() ; i ++ )
{
if ( temp -> children [ word [ i ] - 'a' ] == NULL )
{
TrieNode * node = new TrieNode() ;
temp -> children [ word [ i ] - 'a' ] = node ;
}
temp = temp -> children [ word [ i ] - 'a' ] ;
}
temp -> frequency = 1 ;
temp -> IsEnd = 1 ;
return temp ;
}
TrieNode* search(string word)
{
TrieNode * temp = root ;
for ( int i = 0 ; i < word.length() ; i ++ )
{
if ( temp -> children [ word [ i ] - 'a' ]== NULL )
return NULL ;
temp = temp -> children [ word [ i ] - 'a' ] ;
}
if ( temp -> IsEnd )
return temp ;
else
return NULL ;
}
};
class MinHeap
{
int capacity , count ;
HeapNode * arr ;
public :
MinHeap ( int capacity )
{
this -> capacity = capacity ;
count = 0 ;
arr = new HeapNode [ capacity ] ;
}
void Display ()
{
for ( int i = 0 ; i < capacity ; i ++ )
cout << arr [ i ] .word << ":" << arr [ i ] .frequency << endl ; ;
}
void MinHeapify ( int idx )
{
int left = 2 * idx + 1 ;
int right = 2 * idx + 2 ;
int minidx = idx ;
if ( left < count and arr[ left ] .frequency < arr [ minidx ] .frequency )
minidx = left ;
if ( right < count and arr[ right ] .frequency < arr [ minidx ] .frequency )
minidx = right ;
if ( minidx != idx )
{
arr [ idx ] .TN -> HeapIndex = minidx ;
arr [ minidx ] .TN -> HeapIndex = idx ;
swap ( arr [ idx ] , arr [ minidx ] ) ;
MinHeapify ( minidx ) ;
}
}
void Build ( int idx )
{
int n = count - 1 ;
for ( int i = ( n - 1 ) / 2 ; i >= 0 ; i -- )
MinHeapify ( i ) ;
}
void insert ( TrieNode * TN , string word )
{
if ( TN -> HeapIndex != -1 ) //When word is already present in the heap.
{
//cout << 1 << endl ;
arr [ TN -> HeapIndex ] .frequency ++ ;
MinHeapify ( TN -> HeapIndex ) ;
}
else if ( count < capacity ) // When heap size is less than k
{
//cout << 2 << endl ;
arr [ count ]. word = word ;
arr [ count ]. frequency = 1 ;
arr [ count ] . TN = TN ;
TN -> HeapIndex = count ;
count ++ ;
Build ( count ) ;
}
else if ( TN -> frequency > arr [ 0 ] . frequency )
{
//cout << 3 << endl ;
arr [ 0 ] .TN -> HeapIndex = -1 ;
arr [ 0 ] . word = word ;
arr [ 0 ] . frequency = TN -> frequency ;
arr [ 0 ] . TN = TN ;
TN -> HeapIndex = 0 ;
MinHeapify ( 0 ) ;
}
}
};
void TopKFrequentWords ( string FileName , int k )
{
MinHeap MH ( k ) ;
Trie T ;
fstream file;
file.open ( FileName.c_str() ) ;
string word ;
while ( file >> word )
{
TrieNode * TN = T.search ( word ) ;
if ( !TN )
{
//cout << word << "**" << endl ;
TN = T.insert ( word ) ;
}
else
{
//cout << word << "&&&&" << endl ;
TN -> frequency ++ ;
}
MH.insert ( TN , word ) ;
}
MH.Display() ;
}
int main()
{
int k ;
cin >> k ;
string FileName ;
cin >> FileName ;
TopKFrequentWords ( FileName , k ) ;
}
I want to know how will I accommodate frequency for words like beautiful and ful having same suffix.
2 Answers 2
Overall
I find your code very untidy (and thus hard to read). Please make sure to use nice indentation and generally make the code easy to read.
You don't do any memory management. In the class Trie
it creates a lot of TriNode
objects via new
. You are supposed to track these and eventually call delete
on all these objects.
Code Review
This is not a standard header file:
#include<bits/stdc++.h>
Please never use it.
You are supposed to include only the headers you need.
This is not a good idea:
using namespace std ;
Please read the article: Why is "using namespace std;" considered bad practice?
The main issue is that it can completely change the meaning of code without causing any kind of compiler warning.
The reason the "standard" library is named "std" is so that it is not a big issue to prefix things in the standard library with std::
. This is the preferred technique.
In C++ we have deprecated the use of NULL
and replaced it with nullptr
. The difference is that NULL
is a macro that evaluates to the integer zero while nullptr
has a specific type of std::nullptr_t
. The advantage of this is that nullptr
can only be assigned to pointer types while NULL
can be assigned to pointers and any integer type (which has caused issues).
TN = NULL;
// prefer
TN = nullptr;
Note the default constructor for std::string
assigns it the empty string. So there is no need to set it to the empty string in a constructor.
word = "";
// Useful to reset to the empty string if the word object had
// been used. But in a constructor you know it has just been constructed
// and thus does not need to be set again. So just leave it.
When writing constructors prefer to use the initializer list rather than initializing them in the code block:
HeapNode ()
{
frequency = 0 ;
TN = NULL ;
word = "" ;
}
// I would write this as:
HeapNode()
: frequency(0)
, TN(nullptr)
{}
The reason for this is that member variables are already initialized by their constructor before the code block is entered. The initializer list allows you to pass parameters to the constructors.
So if you initialize variables in the code block you are doing twice the work. Because you are calling the default constructor then you are calling the assignment constructor.
You may think it's OK not to do this for int
and pointer types because they don't have constructors or assignment operators. But this does not consider the normal usage of C++. In is quite normal to change C++ by simply changing the type of a member and not changing anything else. If you have not followed the above pattern then you end up paying the price after the change.
Taking a risk here that there are only lower case alphabetic characters.
if ( temp -> children [ word [ i ] - 'a' ] == NULL )
You should check to make sure there are no upper case letters (or convert to lower case) and no none alphabetic characters.
You can open a file in a single line.
fstream file;
file.open ( FileName.c_str() ) ;
// I would use this:
std::ifstream file(FileName);
Never ever do that:
#include<bits/stdc++.h>
using namespace std ;
Otherwise you'll get into trouble soon. Xou can check Why should I not #include ? and Why is "using namespace std;" considered bad practice? for more details.
Just include every header you need separately and prefix the std::
namespace when needed (alternatively use a using
definition).
-
1\$\begingroup\$ It might be good to explain why
using namespace std;
is bad. \$\endgroup\$2020年04月25日 17:06:38 +00:00Commented Apr 25, 2020 at 17:06 -
\$\begingroup\$ @pacmaninbw Added at least the most relevant links. \$\endgroup\$πάντα ῥεῖ– πάντα ῥεῖ2020年04月25日 17:13:55 +00:00Commented Apr 25, 2020 at 17:13
~~~
and pasting the code between them. \$\endgroup\$how will I accommodate frequency for words like beautiful and ful having same suffix
as specified, for all the specification not presented. \$\endgroup\$