Leetcode longest substring without repeating characters

Question 1

Link here I'm currently learning c++ coming from a python background, so I'll include a solution in python and in c++ for the following problem statement and based on very helpful answers obtained on my previous question I made some improvements in the c++ implementation:

Given a string s, find the length of the longest substring without repeating characters.

Example 1:

Input: s = "abcabcbb" Output: 3

Example 2:

Input: s = "bbbbb" Output: 1

I would like to improve speed for both python and c++ implementations, and I need to improve memory consumption in the c++ implementation since I got a very high figure (600+ MB) as well as the reasons for this high consumption (if the figure is accurate), and I would like general suggestions too.

longest_substring.py

def get_longest(s: str):
 possibilities = (s[i:] for i in range(len(s)))
 maximum = 0
 for possibility in possibilities:
 end_idx = maximum
 while end_idx <= len(possibility):
 current_chunk = possibility[0:end_idx]
 end_idx += 1
 if not (current_size := len(current_chunk)) == len(set(current_chunk)):
 break
 maximum = max(current_size, maximum)
 return maximum
if __name__ == '__main__':
 print(f'Longest substring:\n{get_longest("abcabcbb")}')

Leetcode stats:

Runtime: 260 ms, faster than 19.36% of Python3 online submissions for Longest Substring Without Repeating Characters.

Memory Usage: 14.4 MB, less than 100.00% of Python3 online submissions for Longest Substring Without Repeating Characters.

longest_substring.h

#ifndef LEETCODE_LONGEST_SUBSTRING_H
#define LEETCODE_LONGEST_SUBSTRING_H
#include <string>
int longest_sub(const std::string &s);
bool check_unique(const std::string &s);
#endif //LEETCODE_LONGEST_SUBSTRING_H

longest_substring.cpp

#include "longest_substring.h"
#include <iostream>
using std::endl;
using std::cout;
using std::string;
bool check_unique(const string &s) {
 for (size_t i = 0; i < s.size() - 1; ++i) {
 for (size_t j = i + 1; j < s.size(); ++j) {
 if (s[i] == s[j])
 return false;
 }
 }
 return true;
}
int longest_sub(const string &s) {
 int maximum = 0;
 for (size_t i = 0; i < s.size(); ++i) {
 const string possibility = s.substr(i);
 auto end_idx = maximum;
 while (end_idx < possibility.size()) {
 const string current_chunk = possibility.substr(0, ++end_idx);
 if (!check_unique(current_chunk))
 break;
 auto current_size = current_chunk.size();
 if (current_size > maximum)
 maximum = current_size;
 }
 }
 return maximum;
}
int main() {
 cout << "Longest substring: " << endl;
 cout << longest_sub("abcabcbb");
}

Leetcode stats:

Runtime: 100 ms, faster than 14.88% of C++ online submissions for Longest Substring Without Repeating Characters.

Memory Usage: 604.2 MB, less than 5.02% of C++ online submissions for Longest Substring Without Repeating Characters.

Question 2

Complexity

Your solution has time complexity \$\mathcal{O}(N^4)\$, which is very bad. There is a \$\mathcal{O}(N)\$ solution for this problem. Consider for example the string:

abcdecfghij

Instead of taking substrings and checking if the substring has duplicates, instead keep track of the last position seen for any possible character. This is basically an array of 256 ints, which you should initialize to -1 to indicate that you've never seen the character before. Then, iterate over the string character by character, and check if the character you look at has already been seen. If not, update its position in the array. So after processing abcde, you will have a = 0, b = 1, c = 2, d = 3, e = 4 and the rest is still -1. Then when you encounter c again, you know you have a duplicate. But instead of starting over from the second character of the string, you should start instead from the character right after the first c, so at position 3. And you know you already have a valid substring up to and including the second c. So then you can continue from there. You continue until you find a character with a recorded position that is equal to or more than the starting position of the current substring. Here is a possible implementation in C++:

#include <array>
#include <utility>
int longest_sub(const std::string &s) {
 std::array<int, 256> last_positions;
 last_positions.fill(-1);
 int min_position = 0;
 int maximum_length = 0;
 for (size_t i = 0; i < s.size(); ++i) {
 int &last_position = last_positions[static_cast<unsigned char>(s[i])];
 if (last_position >= min_position) {
 // We encountered a duplicate
 min_position = last_position + 1;
 }
 maximum_length = std::max(maximum_length, int(i + 1 - min_position));
 last_position = i;
 }
 return maximum_length;
}

Question 3

I don't know how they calculate memory usage. I don't see your code having any memory leaks, so it shouldn't use that much.

Question 4

I think you mean a = 0, b = 1, c = 2, d = 3, e = 4

Question 5

@Deduplicator The API is determined by LeetCode in this case, but otherwise you would be right.

Question 6

Those top-coder, elite-coder, coding-challenge and assorted sites really get on my nerves with all their ante-diluvian super-pessimized code which nobody in their right mind ever thought barely adequate. And it just keeps coming. Still, as you restricted yourself to the big picture, I took the litte details.

Question 7

@Deduplicator Thanks, you have some excellent additions! I'm already glad that no <bits/stdc++.h> were #included in OP's post.

Question 8

G. Sliepen already took care of the big picture issues, where you get the most bang for the buck.

Still, there are a few issues with the code aside from using a sub-optimal algorithm:

You should consider std::string_view for the string-argument, and for getting a temporary slice of a long-lived string.
Dynamic allocation is pretty expensive, and best avoided, both on calling a function if the input might not be in the desired format, and in the function itself.
See "What is string_view? " and "How exactly is std::string_view faster than const std::string&? " for more details.
Now that the functions no longer allocate memory, or contain any other potential exception-thrower, mark them noexcept so everyone knows (and the compiler enforces) that it won't throw. It won't do much of anything here, but is good documentation, informs the compiler if it only knows the declaration, and can be important later with using templated code consuming it for best performance and highest exception-safety guarantees.
Also, mark them constexpr while you are at it, to allow use in constant expression, and encourage evaluation at compile-time. That is also a best-practice thing not actually changing much of anything for your example program itself.
You use std::cout twice (whyever you don't push all the output into it in a single expression escapes me there, but that can be argued either way), and std::endl once. Writing (and keeping in mind) those two using-declarations costs more than prefixing the uses with std::. Even if you really dislike writing std::, you don't write it less often.
Don't force flushing a stream unless you really mean it, as it flushes performance down the drain. std::endl outputs a newline and then flushes, stream << std::endl being exactly equivalent to stream << '\n' << std::flush. Thus if you really have to, better be explicit and use std::flush.
See "What is the C++ iostream endl fiasco? " for more detail.

G. Sliepen G. Sliepen 69k3 gold badges74 silver badges180 bronze badges · Accepted Answer · 2020-11-02 08:39:47Z

Complexity

Your solution has time complexity \$\mathcal{O}(N^4)\$, which is very bad. There is a \$\mathcal{O}(N)\$ solution for this problem. Consider for example the string:

abcdecfghij

Instead of taking substrings and checking if the substring has duplicates, instead keep track of the last position seen for any possible character. This is basically an array of 256 ints, which you should initialize to -1 to indicate that you've never seen the character before. Then, iterate over the string character by character, and check if the character you look at has already been seen. If not, update its position in the array. So after processing abcde, you will have a = 0, b = 1, c = 2, d = 3, e = 4 and the rest is still -1. Then when you encounter c again, you know you have a duplicate. But instead of starting over from the second character of the string, you should start instead from the character right after the first c, so at position 3. And you know you already have a valid substring up to and including the second c. So then you can continue from there. You continue until you find a character with a recorded position that is equal to or more than the starting position of the current substring. Here is a possible implementation in C++:

#include <array>
#include <utility>
int longest_sub(const std::string &s) {
 std::array<int, 256> last_positions;
 last_positions.fill(-1);
 int min_position = 0;
 int maximum_length = 0;
 for (size_t i = 0; i < s.size(); ++i) {
 int &last_position = last_positions[static_cast<unsigned char>(s[i])];
 if (last_position >= min_position) {
 // We encountered a duplicate
 min_position = last_position + 1;
 }
 maximum_length = std::max(maximum_length, int(i + 1 - min_position));
 last_position = i;
 }
 return maximum_length;
}

I don't know how they calculate memory usage. I don't see your code having any memory leaks, so it shouldn't use that much.
@Deduplicator The API is determined by LeetCode in this case, but otherwise you would be right.
Those top-coder, elite-coder, coding-challenge and assorted sites really get on my nerves with all their ante-diluvian super-pessimized code which nobody in their right mind ever thought barely adequate. And it just keeps coming. Still, as you restricted yourself to the big picture, I took the litte details.
@Deduplicator Thanks, you have some excellent additions! I'm already glad that no <bits/stdc++.h> were #included in OP's post.

Stack Exchange Network

Leetcode longest substring without repeating characters

2 Answers 2

Complexity

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Leetcode longest substring without repeating characters

2 Answers 2

Complexity

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions