Skip to main content
Code Review

Return to Answer

Commonmark migration
Source Link

##Optimizing the algorithm

Optimizing the algorithm

All tests were done on a 4.7mb file containing a copy of the history of the US and a copy of a medical manual; around 1400 pages worth of content).

Timings are an average of executing one single run through the file 10 times. The code was compiled under VC++2015 x64.

Here's a link to the test program; simply use your own files where appropriate.

http://coliru.stacked-crooked.com/a/28907bc68796a0bf ##Better containers for lookup

Better containers for lookup

You're using a std::map<> in order to count word occurrence. You should use a std::unordered_map<> as it provies amortized O(1) lookup. This is highly beneficial for large documents. We replace:

std::map<std::string, int> counts;

With:

std::unordered_map<std::string, int> counts;

This reduced runtime from 524ms to 352ms.


Similarly, you're using a std::set<> in order to find stopwords in O(logn) time. You should use a std::unordered_set<> as it provides amortized O(1) lookup. We replace:

std::set<std::string> stopwords{ ... };

With:

std::unordered_set<std::string> stopwords{ ... };

I've tested this with the following stopwords.txt file:

a an the am are is was were and then for I

This reduced runtime from 352ms to 325ms.


##Optimize tolower()

Optimize tolower()

We can cut out execution time by directly modifying the string reference instead of copying it in. Thus, this function:

std::string tolower(std::string in) {
 for (auto &c : in) {
 c = std::tolower(unsigned char(c));
 }
 return in;
}

Can be implemented like so:

void tolower(std::string& in) {
 for (auto &c : in) {
 c = std::tolower(unsigned char(c));
 }
}

Demo


Furthermore, since your character table only uses ASCII characters, this provides a bigger speed boost than calling std::tolower() for every individual character:

void tolower_ref( std::string& in )
{
 static constexpr int diff{ 'a' - 'A' };
 for ( auto &c : in )
 if ( c < 'a' && c != '\'' )
 c += diff;
}

Demo

This reduced runtime from 325ms to 274ms.


#Conclusion

Conclusion

By applying a small couple of changes, the speed of the algorithm was increased by ~50%.

##Optimizing the algorithm

All tests were done on a 4.7mb file containing a copy of the history of the US and a copy of a medical manual; around 1400 pages worth of content).

Timings are an average of executing one single run through the file 10 times. The code was compiled under VC++2015 x64.

Here's a link to the test program; simply use your own files where appropriate.

http://coliru.stacked-crooked.com/a/28907bc68796a0bf ##Better containers for lookup

You're using a std::map<> in order to count word occurrence. You should use a std::unordered_map<> as it provies amortized O(1) lookup. This is highly beneficial for large documents. We replace:

std::map<std::string, int> counts;

With:

std::unordered_map<std::string, int> counts;

This reduced runtime from 524ms to 352ms.


Similarly, you're using a std::set<> in order to find stopwords in O(logn) time. You should use a std::unordered_set<> as it provides amortized O(1) lookup. We replace:

std::set<std::string> stopwords{ ... };

With:

std::unordered_set<std::string> stopwords{ ... };

I've tested this with the following stopwords.txt file:

a an the am are is was were and then for I

This reduced runtime from 352ms to 325ms.


##Optimize tolower()

We can cut out execution time by directly modifying the string reference instead of copying it in. Thus, this function:

std::string tolower(std::string in) {
 for (auto &c : in) {
 c = std::tolower(unsigned char(c));
 }
 return in;
}

Can be implemented like so:

void tolower(std::string& in) {
 for (auto &c : in) {
 c = std::tolower(unsigned char(c));
 }
}

Demo


Furthermore, since your character table only uses ASCII characters, this provides a bigger speed boost than calling std::tolower() for every individual character:

void tolower_ref( std::string& in )
{
 static constexpr int diff{ 'a' - 'A' };
 for ( auto &c : in )
 if ( c < 'a' && c != '\'' )
 c += diff;
}

Demo

This reduced runtime from 325ms to 274ms.


#Conclusion

By applying a small couple of changes, the speed of the algorithm was increased by ~50%.

Optimizing the algorithm

All tests were done on a 4.7mb file containing a copy of the history of the US and a copy of a medical manual; around 1400 pages worth of content).

Timings are an average of executing one single run through the file 10 times. The code was compiled under VC++2015 x64.

Here's a link to the test program; simply use your own files where appropriate.

http://coliru.stacked-crooked.com/a/28907bc68796a0bf

Better containers for lookup

You're using a std::map<> in order to count word occurrence. You should use a std::unordered_map<> as it provies amortized O(1) lookup. This is highly beneficial for large documents. We replace:

std::map<std::string, int> counts;

With:

std::unordered_map<std::string, int> counts;

This reduced runtime from 524ms to 352ms.


Similarly, you're using a std::set<> in order to find stopwords in O(logn) time. You should use a std::unordered_set<> as it provides amortized O(1) lookup. We replace:

std::set<std::string> stopwords{ ... };

With:

std::unordered_set<std::string> stopwords{ ... };

I've tested this with the following stopwords.txt file:

a an the am are is was were and then for I

This reduced runtime from 352ms to 325ms.


Optimize tolower()

We can cut out execution time by directly modifying the string reference instead of copying it in. Thus, this function:

std::string tolower(std::string in) {
 for (auto &c : in) {
 c = std::tolower(unsigned char(c));
 }
 return in;
}

Can be implemented like so:

void tolower(std::string& in) {
 for (auto &c : in) {
 c = std::tolower(unsigned char(c));
 }
}

Demo


Furthermore, since your character table only uses ASCII characters, this provides a bigger speed boost than calling std::tolower() for every individual character:

void tolower_ref( std::string& in )
{
 static constexpr int diff{ 'a' - 'A' };
 for ( auto &c : in )
 if ( c < 'a' && c != '\'' )
 c += diff;
}

Demo

This reduced runtime from 325ms to 274ms.


Conclusion

By applying a small couple of changes, the speed of the algorithm was increased by ~50%.

added 14 characters in body
Source Link
user2296177
  • 3.6k
  • 1
  • 15
  • 37

##Optimizing the algorithm

All tests were done on a 4.7mb file containing a copy of the history of the US and a copy of a medical manualmanual; around 1400 pages worth of content).

Timings are an average of executing one single run through the file 10 times. The code was compiled under VC++2015 x64.

##BetterHere's a link to the test program; simply use your own files where appropriate.

http://coliru.stacked-crooked.com/a/28907bc68796a0bf ##Better containers for lookup

You're using a std::map<> in order to count word occurrence. You should use a std::unordered_map<> as it provies amortized O(1) lookup. This is highly beneficial for large documents. We replace:

std::map<std::string, int> counts;

With:

std::unordered_map<std::string, int> counts;

This reduced runtime from 524ms to 352ms.


Similarly, you're using a std::set<> in order to find stopwords in O(logn) time. You should use a std::unordered_set<> as it provides amortized O(1) lookup. We replace:

std::set<std::string> stopwords{ ... };

With:

std::unordered_set<std::string> stopwords{ ... };

I've tested this with the following stopwords.txt file:

a an the am are is was were and then for I

This reduced runtime from 352ms to 325ms.


##Optimize tolower()

We can cut out execution time by directly modifying the string reference instead of copying it in. Thus, this function:

std::string tolower(std::string in) {
 for (auto &c : in) {
 c = std::tolower(unsigned char(c));
 }
 return in;
}

Can be implemented like so:

void tolower(std::string& in) {
 for (auto &c : in) {
 c = std::tolower(unsigned char(c));
 }
}

Demo


Furthermore, since your character table only uses ASCII characters, this provides a bigger speed boost than calling std::tolower() for every individual character:

void tolower_ref( std::string& in )
{
 static constexpr int diff{ 'a' - 'A' };
 for ( auto &c : in )
 if ( c < 'a' && c != '\'' )
 c += diff;
}

Demo

This reduced runtime from 325ms to 274ms.


#Conclusion

By applying a small couple of changes, the speed of the algorithm was increased by ~50%.

##Optimizing the algorithm

All tests were done on a 4.7mb file containing a copy of the history of the US and a copy of a medical manual.

Timings are an average of executing one single run through the file 10 times. The code was compiled under VC++2015 x64.

##Better containers for lookup

You're using a std::map<> in order to count word occurrence. You should use a std::unordered_map<> as it provies amortized O(1) lookup. This is highly beneficial for large documents. We replace:

std::map<std::string, int> counts;

With:

std::unordered_map<std::string, int> counts;

This reduced runtime from 524ms to 352ms.


Similarly, you're using a std::set<> in order to find stopwords in O(logn) time. You should use a std::unordered_set<> as it provides amortized O(1) lookup. We replace:

std::set<std::string> stopwords{ ... };

With:

std::unordered_set<std::string> stopwords{ ... };

I've tested this with the following stopwords.txt file:

a an the am are is was were and then for I

This reduced runtime from 352ms to 325ms.


##Optimize tolower()

We can cut out execution time by directly modifying the string reference instead of copying it in. Thus, this function:

std::string tolower(std::string in) {
 for (auto &c : in) {
 c = std::tolower(unsigned char(c));
 }
 return in;
}

Can be implemented like so:

void tolower(std::string& in) {
 for (auto &c : in) {
 c = std::tolower(unsigned char(c));
 }
}

Demo


Furthermore, since your character table only uses ASCII characters, this provides a bigger speed boost than calling std::tolower() for every individual character:

void tolower_ref( std::string& in )
{
 static constexpr int diff{ 'a' - 'A' };
 for ( auto &c : in )
 if ( c < 'a' && c != '\'' )
 c += diff;
}

Demo

This reduced runtime from 325ms to 274ms.


##Optimizing the algorithm

All tests were done on a 4.7mb file containing a copy of the history of the US and a copy of a medical manual; around 1400 pages worth of content).

Timings are an average of executing one single run through the file 10 times. The code was compiled under VC++2015 x64.

Here's a link to the test program; simply use your own files where appropriate.

http://coliru.stacked-crooked.com/a/28907bc68796a0bf ##Better containers for lookup

You're using a std::map<> in order to count word occurrence. You should use a std::unordered_map<> as it provies amortized O(1) lookup. This is highly beneficial for large documents. We replace:

std::map<std::string, int> counts;

With:

std::unordered_map<std::string, int> counts;

This reduced runtime from 524ms to 352ms.


Similarly, you're using a std::set<> in order to find stopwords in O(logn) time. You should use a std::unordered_set<> as it provides amortized O(1) lookup. We replace:

std::set<std::string> stopwords{ ... };

With:

std::unordered_set<std::string> stopwords{ ... };

I've tested this with the following stopwords.txt file:

a an the am are is was were and then for I

This reduced runtime from 352ms to 325ms.


##Optimize tolower()

We can cut out execution time by directly modifying the string reference instead of copying it in. Thus, this function:

std::string tolower(std::string in) {
 for (auto &c : in) {
 c = std::tolower(unsigned char(c));
 }
 return in;
}

Can be implemented like so:

void tolower(std::string& in) {
 for (auto &c : in) {
 c = std::tolower(unsigned char(c));
 }
}

Demo


Furthermore, since your character table only uses ASCII characters, this provides a bigger speed boost than calling std::tolower() for every individual character:

void tolower_ref( std::string& in )
{
 static constexpr int diff{ 'a' - 'A' };
 for ( auto &c : in )
 if ( c < 'a' && c != '\'' )
 c += diff;
}

Demo

This reduced runtime from 325ms to 274ms.


#Conclusion

By applying a small couple of changes, the speed of the algorithm was increased by ~50%.

added 14 characters in body
Source Link
user2296177
  • 3.6k
  • 1
  • 15
  • 37

##Optimizing the algorithm

All tests were done on a 4.7mb file containing a copy of the history of the US and a copy of a medical manual.

Timings are an average of executing one single run through the file 10 times. The code was compiled under VC++2015 x64.

##Better containers for lookup

You're using a std::map<> in order to count word occurrence. You should use a std::unordered_map<> as it provies amortized O(1) lookup. This is highly beneficial for large documents. We replace:

std::map<std::string, int> counts;

With:

std::unordered_map<std::string, int> counts;

This reduced runtime from 524ms to 352ms.


Similarly, you're using a std::set<> in order to find stopwords in O(logn) time. You should use a std::unordered_set<> as it provides amortized O(1) lookup. We replace:

std::set<std::string> stopwords{ ... };

With:

std::unordered_set<std::string> stopwords{ ... };

I've tested this with the following stopwords.txt file:

a an the am are is was were and then for I

This reduced runtime from 352ms to 325ms.


##Optimize tolower()

We can cut out execution time by directly modifying the string reference instead of copying it in. Thus, this function:

std::string tolower(std::string in) {
 for (auto &c : in) {
 c = std::tolower(unsigned char(c));
 }
 return in;
}

Can be implemented like so:

void tolower(std::string& in) {
 for (auto &c : in) {
 c = std::tolower(unsigned char(c));
 }
}

Demo


Furthermore, since your character table only uses ASCII characters, this provides a bigger speed boost than calling std::tolower() for every individual character:

void tolower_ref( std::string& in )
{
 static constexpr int diff{ 'a' - 'A' };
 for ( auto &c : in )
 if ( c < 'a' && c != '\'' )
 c += diff;
}

Demo

This reduced runtime from 325ms to 274ms.


##Optimizing the algorithm

All tests were done on a 4.7mb file containing a copy of the history of the US and a copy of a medical manual.

Timings are an average of executing one single run through the file 10 times. The code was compiled under VC++2015 x64.

##Better containers for lookup

You're using a std::map<> in order to count word occurrence. You should use a std::unordered_map<> as it provies amortized O(1) lookup. This is highly beneficial for large documents. We replace:

std::map<std::string, int> counts;

With:

std::unordered_map<std::string, int> counts;

This reduced runtime from 524ms to 352ms.


Similarly, you're using a std::set<> in order to find stopwords in O(logn) time. You should use a std::unordered_set<> as it provides amortized O(1) lookup. We replace:

std::set<std::string> stopwords{ ... };

With:

std::unordered_set<std::string> stopwords{ ... };

I've tested this with the following stopwords.txt file:

a an the am are is was were and then for I

This reduced runtime from 352ms to 325ms.

##Optimize tolower()

We can cut out execution time by directly modifying the string reference instead of copying it in. Thus, this function:

std::string tolower(std::string in) {
 for (auto &c : in) {
 c = std::tolower(unsigned char(c));
 }
 return in;
}

Can be implemented like so:

void tolower(std::string& in) {
 for (auto &c : in) {
 c = std::tolower(unsigned char(c));
 }
}

Demo


Furthermore, since your character table only uses ASCII characters, this provides a bigger speed boost than calling std::tolower() for every individual character:

void tolower_ref( std::string& in )
{
 static constexpr int diff{ 'a' - 'A' };
 for ( auto &c : in )
 if ( c < 'a' && c != '\'' )
 c += diff;
}

Demo

This reduced runtime from 325ms to 274ms.


##Optimizing the algorithm

All tests were done on a 4.7mb file containing a copy of the history of the US and a copy of a medical manual.

Timings are an average of executing one single run through the file 10 times. The code was compiled under VC++2015 x64.

##Better containers for lookup

You're using a std::map<> in order to count word occurrence. You should use a std::unordered_map<> as it provies amortized O(1) lookup. This is highly beneficial for large documents. We replace:

std::map<std::string, int> counts;

With:

std::unordered_map<std::string, int> counts;

This reduced runtime from 524ms to 352ms.


Similarly, you're using a std::set<> in order to find stopwords in O(logn) time. You should use a std::unordered_set<> as it provides amortized O(1) lookup. We replace:

std::set<std::string> stopwords{ ... };

With:

std::unordered_set<std::string> stopwords{ ... };

I've tested this with the following stopwords.txt file:

a an the am are is was were and then for I

This reduced runtime from 352ms to 325ms.


##Optimize tolower()

We can cut out execution time by directly modifying the string reference instead of copying it in. Thus, this function:

std::string tolower(std::string in) {
 for (auto &c : in) {
 c = std::tolower(unsigned char(c));
 }
 return in;
}

Can be implemented like so:

void tolower(std::string& in) {
 for (auto &c : in) {
 c = std::tolower(unsigned char(c));
 }
}

Demo


Furthermore, since your character table only uses ASCII characters, this provides a bigger speed boost than calling std::tolower() for every individual character:

void tolower_ref( std::string& in )
{
 static constexpr int diff{ 'a' - 'A' };
 for ( auto &c : in )
 if ( c < 'a' && c != '\'' )
 c += diff;
}

Demo

This reduced runtime from 325ms to 274ms.


added 664 characters in body
Source Link
user2296177
  • 3.6k
  • 1
  • 15
  • 37
Loading
added 664 characters in body
Source Link
user2296177
  • 3.6k
  • 1
  • 15
  • 37
Loading
edited body
Source Link
user2296177
  • 3.6k
  • 1
  • 15
  • 37
Loading
deleted 357 characters in body
Source Link
user2296177
  • 3.6k
  • 1
  • 15
  • 37
Loading
deleted 357 characters in body
Source Link
user2296177
  • 3.6k
  • 1
  • 15
  • 37
Loading
deleted 357 characters in body
Source Link
user2296177
  • 3.6k
  • 1
  • 15
  • 37
Loading
added 277 characters in body
Source Link
user2296177
  • 3.6k
  • 1
  • 15
  • 37
Loading
Source Link
user2296177
  • 3.6k
  • 1
  • 15
  • 37
Loading
lang-cpp

AltStyle によって変換されたページ (->オリジナル) /