Link here I'm currently learning c++ coming from a python background, so I'll include a solution in python and in c++ for the following problem statement and based on very helpful answers obtained on my previous question I made some improvements in the c++ implementation:
Given a string s, find the length of the longest substring without repeating characters.
Example 1:
Input: s = "abcabcbb" Output: 3
Example 2:
Input: s = "bbbbb" Output: 1
I would like to improve speed for both python and c++ implementations, and I need to improve memory consumption in the c++ implementation since I got a very high figure (600+ MB) as well as the reasons for this high consumption (if the figure is accurate), and I would like general suggestions too.
longest_substring.py
def get_longest(s: str):
possibilities = (s[i:] for i in range(len(s)))
maximum = 0
for possibility in possibilities:
end_idx = maximum
while end_idx <= len(possibility):
current_chunk = possibility[0:end_idx]
end_idx += 1
if not (current_size := len(current_chunk)) == len(set(current_chunk)):
break
maximum = max(current_size, maximum)
return maximum
if __name__ == '__main__':
print(f'Longest substring:\n{get_longest("abcabcbb")}')
Leetcode stats:
Runtime: 260 ms, faster than 19.36% of Python3 online submissions for Longest Substring Without Repeating Characters.
Memory Usage: 14.4 MB, less than 100.00% of Python3 online submissions for Longest Substring Without Repeating Characters.
longest_substring.h
#ifndef LEETCODE_LONGEST_SUBSTRING_H
#define LEETCODE_LONGEST_SUBSTRING_H
#include <string>
int longest_sub(const std::string &s);
bool check_unique(const std::string &s);
#endif //LEETCODE_LONGEST_SUBSTRING_H
longest_substring.cpp
#include "longest_substring.h"
#include <iostream>
using std::endl;
using std::cout;
using std::string;
bool check_unique(const string &s) {
for (size_t i = 0; i < s.size() - 1; ++i) {
for (size_t j = i + 1; j < s.size(); ++j) {
if (s[i] == s[j])
return false;
}
}
return true;
}
int longest_sub(const string &s) {
int maximum = 0;
for (size_t i = 0; i < s.size(); ++i) {
const string possibility = s.substr(i);
auto end_idx = maximum;
while (end_idx < possibility.size()) {
const string current_chunk = possibility.substr(0, ++end_idx);
if (!check_unique(current_chunk))
break;
auto current_size = current_chunk.size();
if (current_size > maximum)
maximum = current_size;
}
}
return maximum;
}
int main() {
cout << "Longest substring: " << endl;
cout << longest_sub("abcabcbb");
}
Leetcode stats:
Runtime: 100 ms, faster than 14.88% of C++ online submissions for Longest Substring Without Repeating Characters.
Memory Usage: 604.2 MB, less than 5.02% of C++ online submissions for Longest Substring Without Repeating Characters.
2 Answers 2
Complexity
Your solution has time complexity \$\mathcal{O}(N^4)\$, which is very bad. There is a \$\mathcal{O}(N)\$ solution for this problem. Consider for example the string:
abcdecfghij
Instead of taking substrings and checking if the substring has duplicates, instead keep track of the last position seen for any possible character. This is basically an array of 256 ints, which you should initialize to -1 to indicate that you've never seen the character before. Then, iterate over the string character by character, and check if the character you look at has already been seen. If not, update its position in the array. So after processing abcde
, you will have a = 0, b = 1, c = 2, d = 3, e = 4
and the rest is still -1
. Then when you encounter c
again, you know you have a duplicate. But instead of starting over from the second character of the string, you should start instead from the character right after the first c
, so at position 3. And you know you already have a valid substring up to and including the second c
. So then you can continue from there. You continue until you find a character with a recorded position that is equal to or more than the starting position of the current substring. Here is a possible implementation in C++:
#include <array>
#include <utility>
int longest_sub(const std::string &s) {
std::array<int, 256> last_positions;
last_positions.fill(-1);
int min_position = 0;
int maximum_length = 0;
for (size_t i = 0; i < s.size(); ++i) {
int &last_position = last_positions[static_cast<unsigned char>(s[i])];
if (last_position >= min_position) {
// We encountered a duplicate
min_position = last_position + 1;
}
maximum_length = std::max(maximum_length, int(i + 1 - min_position));
last_position = i;
}
return maximum_length;
}
-
2\$\begingroup\$ I don't know how they calculate memory usage. I don't see your code having any memory leaks, so it shouldn't use that much. \$\endgroup\$G. Sliepen– G. Sliepen2020年11月02日 10:20:13 +00:00Commented Nov 2, 2020 at 10:20
-
2\$\begingroup\$ I think you mean
a = 0, b = 1, c = 2, d = 3, e = 4
\$\endgroup\$mbomb007– mbomb0072020年11月02日 16:36:45 +00:00Commented Nov 2, 2020 at 16:36 -
1\$\begingroup\$ @Deduplicator The API is determined by LeetCode in this case, but otherwise you would be right. \$\endgroup\$G. Sliepen– G. Sliepen2020年11月02日 21:03:40 +00:00Commented Nov 2, 2020 at 21:03
-
2\$\begingroup\$ Those top-coder, elite-coder, coding-challenge and assorted sites really get on my nerves with all their ante-diluvian super-pessimized code which nobody in their right mind ever thought barely adequate. And it just keeps coming. Still, as you restricted yourself to the big picture, I took the litte details. \$\endgroup\$Deduplicator– Deduplicator2020年11月02日 21:10:08 +00:00Commented Nov 2, 2020 at 21:10
-
1\$\begingroup\$ @Deduplicator Thanks, you have some excellent additions! I'm already glad that no
<bits/stdc++.h>
were#include
d in OP's post. \$\endgroup\$G. Sliepen– G. Sliepen2020年11月02日 21:12:58 +00:00Commented Nov 2, 2020 at 21:12
G. Sliepen already took care of the big picture issues, where you get the most bang for the buck.
Still, there are a few issues with the code aside from using a sub-optimal algorithm:
You should consider
std::string_view
for the string-argument, and for getting a temporary slice of a long-lived string.
Dynamic allocation is pretty expensive, and best avoided, both on calling a function if the input might not be in the desired format, and in the function itself.
See "What isstring_view
? " and "How exactly isstd::string_view
faster thanconst std::string&
? " for more details.Now that the functions no longer allocate memory, or contain any other potential exception-thrower, mark them
noexcept
so everyone knows (and the compiler enforces) that it won't throw. It won't do much of anything here, but is good documentation, informs the compiler if it only knows the declaration, and can be important later with using templated code consuming it for best performance and highest exception-safety guarantees.Also, mark them
constexpr
while you are at it, to allow use in constant expression, and encourage evaluation at compile-time. That is also a best-practice thing not actually changing much of anything for your example program itself.You use
std::cout
twice (whyever you don't push all the output into it in a single expression escapes me there, but that can be argued either way), andstd::endl
once. Writing (and keeping in mind) those two using-declarations costs more than prefixing the uses withstd::
. Even if you really dislike writingstd::
, you don't write it less often.Don't force flushing a stream unless you really mean it, as it flushes performance down the drain.
std::endl
outputs a newline and then flushes,stream << std::endl
being exactly equivalent tostream << '\n' << std::flush
. Thus if you really have to, better be explicit and usestd::flush
.
See "What is the C++ iostreamendl
fiasco? " for more detail.