Minimal length of substring that forms cyclic string

Question 1

There is a known problem stated as "Cyclic string":

A string S has been recorded many times, and then from the resulting line took a substring, and give to you.Your task is to determine the minimum possible length of a string S.

Input

In the first and only line of the input file contains a string that contains only letters, line length up to 50000 characters.

Output

The output file you want to display a single number - the answer to the problem.

It can be solved with substring hashes: source (Russian) / English translation.

I composed a solution:

 bool substring_eq(const vector<uint64_t> &p_pow, const vector<uint64_t> &h,
 size_t i1, size_t i2, size_t len) {
 uint64_t h1 = h[i1 + len - 1];
 if (i1) h1 -= h[i1 - 1];
 uint64_t h2 = h[i2 + len - 1];
 if (i2) h2 -= h[i2 - 1];
 
 return (i1 < i2 && (h1 * p_pow[i2 - i1]) == h2) ||
 (i1 > i2 && h1 == (h2 * p_pow[i1 - i2]));
 }
 
 void compute_hashes(const string &s, vector<uint64_t> &p_pow, vector<uint64_t> &h) {
 const int p = 191;
 p_pow[0] = 1;
 for (size_t i = 1; i < p_pow.size(); ++i) {
 p_pow[i] = p_pow[i - 1] * p;
 }
 
 for (size_t i = 0; i < s.length(); ++i) {
 h[i] = (s[i] - 'A' + 1) * p_pow[i];
 if (i) h[i] += h[i - 1];
 }
 }
 
 size_t solve(const string &s) {
 vector<uint64_t> p_pow(s.length()), hashes(s.length());
 
 compute_hashes(s, p_pow, hashes);
 
 size_t cycle_len = s.length();
 for (size_t k = 1; k < s.length(); ++k) {
 if (substring_eq(p_pow, hashes, 0, k, s.length() - k)) {
 cycle_len = k;
 break;
 }
 }
 return cycle_len;
 }
 
 int main() {
 //test();
 
 string input;
 cin >> input;
 cout << solve(input) << std::endl;
 return 0;
 }

It doesn't pass the last test in e-judging system (I don't know the contents of it).

I even composed a function that generates input strings, but it catches no errors:

 //////////////////////////
 /// Stress testing
 //////////////////////////
 #include <random>
 
 std::string random_string(std::string::size_type length) {
 static auto &chrs = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
 
 static std::mt19937 rg{std::random_device{}()};
 static std::uniform_int_distribution<std::string::size_type> pick(0, sizeof(chrs) - 2);
 
 std::string s;
 
 s.reserve(length);
 
 while (s.length() < length) {
 char ch = chrs[pick(rg)];
 if (s.find(ch) == string::npos) {
 s += ch;
 }
 }
 
 return s;
 }
 
 size_t solve(const string &s);
 void test() {
 srand(100);
 #pragma clang diagnostic push
 #pragma clang diagnostic ignored "-Wmissing-noreturn"
 while (true) {
 string::size_type len = (unsigned long) (rand() % 50 + 1);
 string s = random_string(len);
 int count = rand() % 5000 + 2;
 string input;
 for (int i = 0; i < count; ++i) {
 input += s;
 }
 string input1 = input;
 input1 = input1.substr(rand() % (input1.length() - s.length()));
 input1 = input1.substr(0, rand() % (input1.length() - s.length()) + s.length());
 
 if (input1.empty()) continue;
 
 size_t k = solve(input1);
 if (k != s.length()) {
 cout << "boom:" << input1;
 }
 }
 #pragma clang diagnostic pop
 }
 //////////////////////////

What did I miss in the solution?

Question 2

"I can't passed the last test in e-judging system" - is it TLE or WA?

Question 3

What are TLE and WA? Particular e-judges? If yes then I don't know them. And I think it makes not a big difference where that tasks came from.

Question 4

TLE = Time Limit Exceeded (время истекло), WA = Wrong Answer (ошибка). I am asking because WA is strictly off-topic here.

Question 5

Sounds like Wrong Answer, which makes this question inappropriate for Code Review.

Question 6

OK, thanks for keeping me updated. I read some threads here stating that asking "homework" assignments is fine as far as a student made a significant work. I will ask somewhere else then. Good luck!

Question 7

This code is missing many necessary definitions. I guessed at

#include <cstdint>
#include <iostream>
#include <string>
#include <vector>
using std::cout;
using std::size_t;
using std::string;
using std::uint64_t;
using std::vector;

But I don't recommend all those usings - just write the types in full (perhaps use an informative alias for your std::vector<std::uint64_t>s).

Do we really need exactly 64-bit type, or can we be portable and using std::uint_fast64_t instead?

The functions are very opaque - there's no guidance of what p_pow, h, i1, i2 etc indicate. That makes it almost impossible to review the code without completely dissecting and rewriting it.

However, it seems to be based on a rolling hash function, which immediately raises a red flag: when a hash match is found, it needs to be confirmed as an actual match rather than merely a collision, but this code assumes there are no collisions. That's a serious bug, in my opinion. We need something like

std::size_t solve(std::string_view const s)
{
 ⋮
 for (size_t k = 1; k < s.length(); ++k) {
 auto const len = s.length() - k;
 if (substring_eq(p_pow, hashes, 0, k, len)
 && s.substr(0, len) == s.substr(k, len)) {
 // found it
 return k;
 }
 }

(Note the use of string view, so that .substr() is cheap.)

The substring_eq() function could be simplified if we ensure that i1 <= i2 and if we start h with a zero element:

using power_vector = std::vector<std::uint_fast64_t>;
bool substring_eq(const power_vector &p_pow, const power_vector &h,
 std::size_t i1, std::size_t i2, std::size_t len)
{
 if (i2 < i1) {
 std::swap(i1, i2);
 }
 assert(i1 <= i2);
 auto const h1 = h[i1 + len] - h[i1];
 auto const h2 = h[i2 + len] - h[i2];
 return h2 == h1 * p_pow[i2 - i1];
}
void compute_hashes(std::string_view const s, power_vector &p_pow, power_vector &h)
{
 constexpr unsigned p = 191; // prime for exponentiation
 // powers of p
 p_pow[0] = 1;
 for (std::size_t i = 1; i < p_pow.size(); ++i) {
 p_pow[i] = p_pow[i - 1] * p;
 }
 // polynomial hash at each position in s
 h[0] = 0;
 for (std::size_t i = 0; i < s.length(); ++i) {
 h[i+1] = h[i] + s[i] * p_pow[i];
 }
}

We could probably also take advantage of the fact that substr_eq() is always called with i0 == 0, and eliminate that argument:

bool substring_eq(const power_vector &p_pow, const power_vector &h,
 std::size_t i2, std::size_t len)
{
 auto const h1 = h[len];
 auto const h2 = h[i2 + len] - h[i2];
 return h2 == h1 * p_pow[i2];
}

The test program generates random inputs to the solver. This is not a very good adversarial test of worst-case performance.

Improved code

#include <cstdint>
#include <iostream>
#include <string_view>
#include <vector>
using power_vector = std::vector<std::uint_fast64_t>;
using hash_vector = std::vector<std::uint_fast64_t>;
// Test whether string of length len starting at offset is exact match
// for same length prefix, according to hash values in h, based on
// integer powers in p_pow.
bool substring_eq(const power_vector &p_pow, const hash_vector &h,
 std::size_t offset, std::size_t len)
{
 auto const h1 = h[len];
 auto const h2 = h[offset + len] - h[offset];
 return h2 == h1 * p_pow[offset];
}
// Generate integer powers of a prime in p_pow, and rolling polynomial
// hash of input s in h.
void compute_hashes(std::string_view const s, power_vector &p_pow, hash_vector &h)
{
 constexpr unsigned p = 191; // prime for exponentiation
 // powers of p
 p_pow[0] = 1;
 for (std::size_t i = 1; i < p_pow.size(); ++i) {
 p_pow[i] = p_pow[i - 1] * p;
 }
 // polynomial hash at each position in s
 h[0] = 0;
 for (std::size_t i = 0; i < s.length(); ++i) {
 h[i+1] = h[i] + s[i] * p_pow[i];
 }
}
std::size_t solve(std::string_view const s)
{
 power_vector p_pow(s.length());
 hash_vector hashes(s.length() + 1);
 compute_hashes(s, p_pow, hashes);
 for (std::size_t k = 1; k < s.length(); ++k) {
 auto const len = s.length() - k;
 if (substring_eq(p_pow, hashes, k, len)
 && s.substr(0, len) == s.substr(k, len)) {
 // found it
 return k;
 }
 }
 // no cycle shorter than the whole string
 return s.length();
}

I still don't like the data sharing through the two vectors. The object-oriented approach is to have a clear owner for them both, who is responsible for maintaining their invariants:

#include <algorithm>
#include <cstdint>
#include <iostream>
#include <iterator>
#include <ranges>
#include <string_view>
#include <vector>
class fast_substring_compare
{
 std::string_view s;
 std::vector<std::uint_fast64_t> powers = {};
 std::vector<std::uint_fast64_t> hash = {};
public:
 explicit fast_substring_compare(std::string_view s)
 : s{s}
 {
 constexpr std::uint_fast64_t one{1};
 // Generate all powers of our chosen prime
 powers.reserve(s.length());
 powers.push_back(one);
 std::generate_n(std::back_inserter(powers), s.length() - 1,
 [n{one}] mutable { return n *= 191; });
 // polynomial hash at each position in s
 hash.reserve(s.length() + 1);
 hash.push_back(0);
 for (auto [c, p]: std::views::zip(s, powers)) {
 hash.push_back(hash.back() + (unsigned char)c * p);
 }
 }
 // Test whether substring string of length len starting at i1 is exact match
 // for same length substring starting at i2.
 bool substring_eq(std::size_t i1, std::size_t i2, std::size_t len) const
 {
 if (i2 < i1) {
 std::swap(i1, i2);
 }
 auto const h1 = hash[i1 + len] - hash[i1];
 auto const h2 = hash[i2 + len] - hash[i2];
 return h2 == h1 * powers[i2 - i1]
 && s.substr(i1, len) == s.substr(i2, len);
 }
};
std::size_t solve(std::string_view const s)
{
 const fast_substring_compare comparer{s};
 for (auto offset: std::views::iota(1uz, s.length())) {
 auto const len = s.length() - offset;
 if (comparer.substring_eq(0, offset, len)) {
 // found it
 return offset;
 }
 }
 // no cycle shorter than the whole string
 return s.length();
}

Toby Speight Toby Speight 87.9k14 gold badges104 silver badges325 bronze badges · Answer 1 · 2024-10-18 15:17:28Z

This code is missing many necessary definitions. I guessed at

#include <cstdint>
#include <iostream>
#include <string>
#include <vector>
using std::cout;
using std::size_t;
using std::string;
using std::uint64_t;
using std::vector;

But I don't recommend all those usings - just write the types in full (perhaps use an informative alias for your std::vector<std::uint64_t>s).

Do we really need exactly 64-bit type, or can we be portable and using std::uint_fast64_t instead?

The functions are very opaque - there's no guidance of what p_pow, h, i1, i2 etc indicate. That makes it almost impossible to review the code without completely dissecting and rewriting it.

However, it seems to be based on a rolling hash function, which immediately raises a red flag: when a hash match is found, it needs to be confirmed as an actual match rather than merely a collision, but this code assumes there are no collisions. That's a serious bug, in my opinion. We need something like

std::size_t solve(std::string_view const s)
{
 ⋮
 for (size_t k = 1; k < s.length(); ++k) {
 auto const len = s.length() - k;
 if (substring_eq(p_pow, hashes, 0, k, len)
 && s.substr(0, len) == s.substr(k, len)) {
 // found it
 return k;
 }
 }

(Note the use of string view, so that .substr() is cheap.)

The substring_eq() function could be simplified if we ensure that i1 <= i2 and if we start h with a zero element:

using power_vector = std::vector<std::uint_fast64_t>;
bool substring_eq(const power_vector &p_pow, const power_vector &h,
 std::size_t i1, std::size_t i2, std::size_t len)
{
 if (i2 < i1) {
 std::swap(i1, i2);
 }
 assert(i1 <= i2);
 auto const h1 = h[i1 + len] - h[i1];
 auto const h2 = h[i2 + len] - h[i2];
 return h2 == h1 * p_pow[i2 - i1];
}
void compute_hashes(std::string_view const s, power_vector &p_pow, power_vector &h)
{
 constexpr unsigned p = 191; // prime for exponentiation
 // powers of p
 p_pow[0] = 1;
 for (std::size_t i = 1; i < p_pow.size(); ++i) {
 p_pow[i] = p_pow[i - 1] * p;
 }
 // polynomial hash at each position in s
 h[0] = 0;
 for (std::size_t i = 0; i < s.length(); ++i) {
 h[i+1] = h[i] + s[i] * p_pow[i];
 }
}

We could probably also take advantage of the fact that substr_eq() is always called with i0 == 0, and eliminate that argument:

bool substring_eq(const power_vector &p_pow, const power_vector &h,
 std::size_t i2, std::size_t len)
{
 auto const h1 = h[len];
 auto const h2 = h[i2 + len] - h[i2];
 return h2 == h1 * p_pow[i2];
}

The test program generates random inputs to the solver. This is not a very good adversarial test of worst-case performance.

Improved code

#include <cstdint>
#include <iostream>
#include <string_view>
#include <vector>
using power_vector = std::vector<std::uint_fast64_t>;
using hash_vector = std::vector<std::uint_fast64_t>;
// Test whether string of length len starting at offset is exact match
// for same length prefix, according to hash values in h, based on
// integer powers in p_pow.
bool substring_eq(const power_vector &p_pow, const hash_vector &h,
 std::size_t offset, std::size_t len)
{
 auto const h1 = h[len];
 auto const h2 = h[offset + len] - h[offset];
 return h2 == h1 * p_pow[offset];
}
// Generate integer powers of a prime in p_pow, and rolling polynomial
// hash of input s in h.
void compute_hashes(std::string_view const s, power_vector &p_pow, hash_vector &h)
{
 constexpr unsigned p = 191; // prime for exponentiation
 // powers of p
 p_pow[0] = 1;
 for (std::size_t i = 1; i < p_pow.size(); ++i) {
 p_pow[i] = p_pow[i - 1] * p;
 }
 // polynomial hash at each position in s
 h[0] = 0;
 for (std::size_t i = 0; i < s.length(); ++i) {
 h[i+1] = h[i] + s[i] * p_pow[i];
 }
}
std::size_t solve(std::string_view const s)
{
 power_vector p_pow(s.length());
 hash_vector hashes(s.length() + 1);
 compute_hashes(s, p_pow, hashes);
 for (std::size_t k = 1; k < s.length(); ++k) {
 auto const len = s.length() - k;
 if (substring_eq(p_pow, hashes, k, len)
 && s.substr(0, len) == s.substr(k, len)) {
 // found it
 return k;
 }
 }
 // no cycle shorter than the whole string
 return s.length();
}

I still don't like the data sharing through the two vectors. The object-oriented approach is to have a clear owner for them both, who is responsible for maintaining their invariants:

#include <algorithm>
#include <cstdint>
#include <iostream>
#include <iterator>
#include <ranges>
#include <string_view>
#include <vector>
class fast_substring_compare
{
 std::string_view s;
 std::vector<std::uint_fast64_t> powers = {};
 std::vector<std::uint_fast64_t> hash = {};
public:
 explicit fast_substring_compare(std::string_view s)
 : s{s}
 {
 constexpr std::uint_fast64_t one{1};
 // Generate all powers of our chosen prime
 powers.reserve(s.length());
 powers.push_back(one);
 std::generate_n(std::back_inserter(powers), s.length() - 1,
 [n{one}] mutable { return n *= 191; });
 // polynomial hash at each position in s
 hash.reserve(s.length() + 1);
 hash.push_back(0);
 for (auto [c, p]: std::views::zip(s, powers)) {
 hash.push_back(hash.back() + (unsigned char)c * p);
 }
 }
 // Test whether substring string of length len starting at i1 is exact match
 // for same length substring starting at i2.
 bool substring_eq(std::size_t i1, std::size_t i2, std::size_t len) const
 {
 if (i2 < i1) {
 std::swap(i1, i2);
 }
 auto const h1 = hash[i1 + len] - hash[i1];
 auto const h2 = hash[i2 + len] - hash[i2];
 return h2 == h1 * powers[i2 - i1]
 && s.substr(i1, len) == s.substr(i2, len);
 }
};
std::size_t solve(std::string_view const s)
{
 const fast_substring_compare comparer{s};
 for (auto offset: std::views::iota(1uz, s.length())) {
 auto const len = s.length() - offset;
 if (comparer.substring_eq(0, offset, len)) {
 // found it
 return offset;
 }
 }
 // no cycle shorter than the whole string
 return s.length();
}

Stack Exchange Network

Minimal length of substring that forms cyclic string

Input

Output

1 Answer 1

Improved code

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Minimal length of substring that forms cyclic string

Input

Output

1 Answer 1

Improved code

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions