2
\$\begingroup\$

There is a known problem stated as "Cyclic string":

A string S has been recorded many times, and then from the resulting line took a substring, and give to you.Your task is to determine the minimum possible length of a string S.

Input

In the first and only line of the input file contains a string that contains only letters, line length up to 50000 characters.

Output

The output file you want to display a single number - the answer to the problem.

It can be solved with substring hashes: source (Russian) / English translation.

I composed a solution:

 bool substring_eq(const vector<uint64_t> &p_pow, const vector<uint64_t> &h,
 size_t i1, size_t i2, size_t len) {
 uint64_t h1 = h[i1 + len - 1];
 if (i1) h1 -= h[i1 - 1];
 uint64_t h2 = h[i2 + len - 1];
 if (i2) h2 -= h[i2 - 1];
 
 return (i1 < i2 && (h1 * p_pow[i2 - i1]) == h2) ||
 (i1 > i2 && h1 == (h2 * p_pow[i1 - i2]));
 }
 
 void compute_hashes(const string &s, vector<uint64_t> &p_pow, vector<uint64_t> &h) {
 const int p = 191;
 p_pow[0] = 1;
 for (size_t i = 1; i < p_pow.size(); ++i) {
 p_pow[i] = p_pow[i - 1] * p;
 }
 
 for (size_t i = 0; i < s.length(); ++i) {
 h[i] = (s[i] - 'A' + 1) * p_pow[i];
 if (i) h[i] += h[i - 1];
 }
 }
 
 size_t solve(const string &s) {
 vector<uint64_t> p_pow(s.length()), hashes(s.length());
 
 compute_hashes(s, p_pow, hashes);
 
 size_t cycle_len = s.length();
 for (size_t k = 1; k < s.length(); ++k) {
 if (substring_eq(p_pow, hashes, 0, k, s.length() - k)) {
 cycle_len = k;
 break;
 }
 }
 return cycle_len;
 }
 
 int main() {
 //test();
 
 string input;
 cin >> input;
 cout << solve(input) << std::endl;
 return 0;
 }

It doesn't pass the last test in e-judging system (I don't know the contents of it).

I even composed a function that generates input strings, but it catches no errors:

 //////////////////////////
 /// Stress testing
 //////////////////////////
 #include <random>
 
 std::string random_string(std::string::size_type length) {
 static auto &chrs = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
 
 static std::mt19937 rg{std::random_device{}()};
 static std::uniform_int_distribution<std::string::size_type> pick(0, sizeof(chrs) - 2);
 
 std::string s;
 
 s.reserve(length);
 
 while (s.length() < length) {
 char ch = chrs[pick(rg)];
 if (s.find(ch) == string::npos) {
 s += ch;
 }
 }
 
 return s;
 }
 
 size_t solve(const string &s);
 void test() {
 srand(100);
 #pragma clang diagnostic push
 #pragma clang diagnostic ignored "-Wmissing-noreturn"
 while (true) {
 string::size_type len = (unsigned long) (rand() % 50 + 1);
 string s = random_string(len);
 int count = rand() % 5000 + 2;
 string input;
 for (int i = 0; i < count; ++i) {
 input += s;
 }
 string input1 = input;
 input1 = input1.substr(rand() % (input1.length() - s.length()));
 input1 = input1.substr(0, rand() % (input1.length() - s.length()) + s.length());
 
 if (input1.empty()) continue;
 
 size_t k = solve(input1);
 if (k != s.length()) {
 cout << "boom:" << input1;
 }
 }
 #pragma clang diagnostic pop
 }
 //////////////////////////

What did I miss in the solution?

Toby Speight
87.1k14 gold badges104 silver badges322 bronze badges
asked Dec 17, 2016 at 20:31
\$\endgroup\$
6
  • \$\begingroup\$ "I can't passed the last test in e-judging system" - is it TLE or WA? \$\endgroup\$ Commented Dec 17, 2016 at 21:07
  • \$\begingroup\$ What are TLE and WA? Particular e-judges? If yes then I don't know them. And I think it makes not a big difference where that tasks came from. \$\endgroup\$ Commented Dec 17, 2016 at 21:12
  • \$\begingroup\$ TLE = Time Limit Exceeded (время истекло), WA = Wrong Answer (ошибка). I am asking because WA is strictly off-topic here. \$\endgroup\$ Commented Dec 17, 2016 at 21:13
  • \$\begingroup\$ Sounds like Wrong Answer, which makes this question inappropriate for Code Review. \$\endgroup\$ Commented Dec 17, 2016 at 21:14
  • 1
    \$\begingroup\$ OK, thanks for keeping me updated. I read some threads here stating that asking "homework" assignments is fine as far as a student made a significant work. I will ask somewhere else then. Good luck! \$\endgroup\$ Commented Dec 18, 2016 at 0:18

1 Answer 1

2
\$\begingroup\$

This code is missing many necessary definitions. I guessed at

#include <cstdint>
#include <iostream>
#include <string>
#include <vector>
using std::cout;
using std::size_t;
using std::string;
using std::uint64_t;
using std::vector;

But I don't recommend all those usings - just write the types in full (perhaps use an informative alias for your std::vector<std::uint64_t>s).


Do we really need exactly 64-bit type, or can we be portable and using std::uint_fast64_t instead?


The functions are very opaque - there's no guidance of what p_pow, h, i1, i2 etc indicate. That makes it almost impossible to review the code without completely dissecting and rewriting it.

However, it seems to be based on a rolling hash function, which immediately raises a red flag: when a hash match is found, it needs to be confirmed as an actual match rather than merely a collision, but this code assumes there are no collisions. That's a serious bug, in my opinion. We need something like

std::size_t solve(std::string_view const s)
{
 ⋮
 for (size_t k = 1; k < s.length(); ++k) {
 auto const len = s.length() - k;
 if (substring_eq(p_pow, hashes, 0, k, len)
 && s.substr(0, len) == s.substr(k, len)) {
 // found it
 return k;
 }
 }

(Note the use of string view, so that .substr() is cheap.)


The substring_eq() function could be simplified if we ensure that i1 <= i2 and if we start h with a zero element:

using power_vector = std::vector<std::uint_fast64_t>;
bool substring_eq(const power_vector &p_pow, const power_vector &h,
 std::size_t i1, std::size_t i2, std::size_t len)
{
 if (i2 < i1) {
 std::swap(i1, i2);
 }
 assert(i1 <= i2);
 auto const h1 = h[i1 + len] - h[i1];
 auto const h2 = h[i2 + len] - h[i2];
 return h2 == h1 * p_pow[i2 - i1];
}
void compute_hashes(std::string_view const s, power_vector &p_pow, power_vector &h)
{
 constexpr unsigned p = 191; // prime for exponentiation
 // powers of p
 p_pow[0] = 1;
 for (std::size_t i = 1; i < p_pow.size(); ++i) {
 p_pow[i] = p_pow[i - 1] * p;
 }
 // polynomial hash at each position in s
 h[0] = 0;
 for (std::size_t i = 0; i < s.length(); ++i) {
 h[i+1] = h[i] + s[i] * p_pow[i];
 }
}

We could probably also take advantage of the fact that substr_eq() is always called with i0 == 0, and eliminate that argument:

bool substring_eq(const power_vector &p_pow, const power_vector &h,
 std::size_t i2, std::size_t len)
{
 auto const h1 = h[len];
 auto const h2 = h[i2 + len] - h[i2];
 return h2 == h1 * p_pow[i2];
}

The test program generates random inputs to the solver. This is not a very good adversarial test of worst-case performance.


Improved code

#include <cstdint>
#include <iostream>
#include <string_view>
#include <vector>
using power_vector = std::vector<std::uint_fast64_t>;
using hash_vector = std::vector<std::uint_fast64_t>;
// Test whether string of length len starting at offset is exact match
// for same length prefix, according to hash values in h, based on
// integer powers in p_pow.
bool substring_eq(const power_vector &p_pow, const hash_vector &h,
 std::size_t offset, std::size_t len)
{
 auto const h1 = h[len];
 auto const h2 = h[offset + len] - h[offset];
 return h2 == h1 * p_pow[offset];
}
// Generate integer powers of a prime in p_pow, and rolling polynomial
// hash of input s in h.
void compute_hashes(std::string_view const s, power_vector &p_pow, hash_vector &h)
{
 constexpr unsigned p = 191; // prime for exponentiation
 // powers of p
 p_pow[0] = 1;
 for (std::size_t i = 1; i < p_pow.size(); ++i) {
 p_pow[i] = p_pow[i - 1] * p;
 }
 // polynomial hash at each position in s
 h[0] = 0;
 for (std::size_t i = 0; i < s.length(); ++i) {
 h[i+1] = h[i] + s[i] * p_pow[i];
 }
}
std::size_t solve(std::string_view const s)
{
 power_vector p_pow(s.length());
 hash_vector hashes(s.length() + 1);
 compute_hashes(s, p_pow, hashes);
 for (std::size_t k = 1; k < s.length(); ++k) {
 auto const len = s.length() - k;
 if (substring_eq(p_pow, hashes, k, len)
 && s.substr(0, len) == s.substr(k, len)) {
 // found it
 return k;
 }
 }
 // no cycle shorter than the whole string
 return s.length();
}

I still don't like the data sharing through the two vectors. The object-oriented approach is to have a clear owner for them both, who is responsible for maintaining their invariants:

#include <algorithm>
#include <cstdint>
#include <iostream>
#include <iterator>
#include <ranges>
#include <string_view>
#include <vector>
class fast_substring_compare
{
 std::string_view s;
 std::vector<std::uint_fast64_t> powers = {};
 std::vector<std::uint_fast64_t> hash = {};
public:
 explicit fast_substring_compare(std::string_view s)
 : s{s}
 {
 constexpr std::uint_fast64_t one{1};
 // Generate all powers of our chosen prime
 powers.reserve(s.length());
 powers.push_back(one);
 std::generate_n(std::back_inserter(powers), s.length() - 1,
 [n{one}] mutable { return n *= 191; });
 // polynomial hash at each position in s
 hash.reserve(s.length() + 1);
 hash.push_back(0);
 for (auto [c, p]: std::views::zip(s, powers)) {
 hash.push_back(hash.back() + (unsigned char)c * p);
 }
 }
 // Test whether substring string of length len starting at i1 is exact match
 // for same length substring starting at i2.
 bool substring_eq(std::size_t i1, std::size_t i2, std::size_t len) const
 {
 if (i2 < i1) {
 std::swap(i1, i2);
 }
 auto const h1 = hash[i1 + len] - hash[i1];
 auto const h2 = hash[i2 + len] - hash[i2];
 return h2 == h1 * powers[i2 - i1]
 && s.substr(i1, len) == s.substr(i2, len);
 }
};
std::size_t solve(std::string_view const s)
{
 const fast_substring_compare comparer{s};
 for (auto offset: std::views::iota(1uz, s.length())) {
 auto const len = s.length() - offset;
 if (comparer.substring_eq(0, offset, len)) {
 // found it
 return offset;
 }
 }
 // no cycle shorter than the whole string
 return s.length();
}
answered Oct 18, 2024 at 15:17
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.